Coverage Is Not Detection

How MITRE ATT&CK became a metric and why that breaks detection engineering

May 05, 2026

MITRE ATT&CK is the most useful framework in cybersecurity. As a detection engineer, I use it every day. But the way the industry has turned it into a measurement device, the green-cell heatmap that lands in board decks and vendor pitches, has very little to do with whether anyone is actually detecting attacks. This post is about why.

The Trust Problem

To understand why coverage maps won, you have to understand the non-technical problems they solved.

Detection engineering is complicated and nuanced. The people who build detections know if their detections are solid. The people paying for them don’t. A CISO buying MSSP services, a board reviewing posture, a regulator auditing compliance: none of these people can open a SIEM, read a rule, understand the threat, and evaluate whether it works. Before ATT&CK, the claim “we have strong detection capabilities” was an expert judgment with no shared frame for comparison or measurement.

ATT&CK solved this by providing a shared knowledge base. Techniques have IDs. Coverage could be mapped to something concrete. And most importantly, coverage could be quantified, counted, and compared. A heatmap gave anyone needing to show value something to put in a board deck. A percentage gave executives something to compare. The claim shifted from “trust me, we cover that” to “we cover 85% of ATT&CK, and the other vendors cover less.” The second claim isn’t more accurate than the first. It’s more defensible, and it looks like a measurement.

Organizations adopt quantitative methods not because numbers are more precise than expert judgment, but because numbers are harder to challenge. The historian Theodore Porter's standard example is the U.S. Army Corps of Engineers, who had deep expertise in evaluating infrastructure projects but had cost-benefit analysis imposed on them by Congress anyway. Discretionary judgment is politically vulnerable. Quantification is not. Porter calls this mechanical objectivity: quantification adopted not to improve decisions, but to make them unchallengeable.

ATT&CK coverage percentages are mechanical objectivity applied to detection engineering. They exist because the relationship between detection providers and their clients is structurally low-trust, not because anyone is dishonest, but because the buyer can’t evaluate the seller’s expertise. The percentage gives both parties something to point at. In a low-trust market, legibility is more valuable than accuracy.

The MITRE team designed ATT&CK as a knowledge base for understanding adversary behavior, not as a compliance checklist. They have said this repeatedly. From their Design and Philosophy paper: “it’s important to note that in general, coverage of every ATT&CK technique is unrealistic.” But frameworks don’t get to choose how they’re used. A matrix invites filling in cells. Filled cells invite counting. Counting invites percentages. Percentages become metrics. Metrics get optimized for, regardless of whether the optimization improves security.

The detection engineer writing a rule to turn an obscure technique green probably knows the rule is pointless. But the gap on the map is visible, and the executive or salesperson asking “why is this cell red?” is a conversation nobody wants to have. The honest answer (”this technique isn’t relevant to our threat model and any rule we write would be shallow”) opens a line of questioning the coverage map was designed to avoid. Filling the cell is simpler than justifying why it shouldn’t matter. The metric creates its own demand. This is Goodhart’s Law in action: “when a measure becomes a target, it ceases to be a good measure.” So, what does coverage-driven detection engineering actually produce?

What “Covered” Doesn’t Mean

The coverage map answers one question: does a detection exist for this technique? That question is almost useless. The variance within “yes” is enormous, the question ignores whether the technique matters at all, and it doesn’t test if the ingested logs even make building detection possible.

The Depth Problem

Take T1059, Command and Scripting Interpreter. Almost every environment has a rule for it. A mature environment has five or more. The cell shows green.

Here is what a typical rule does: it fires when powershell.exe executes with a suspicious indicator, maybe flags suspicious cmdlets, maybe flags encoded commands, maybe looks for execution policy set to bypass. It is a solid detection. It will catch the kind of attack where an adversary runs a raw PowerShell script with no evasion.

Here is what those rules and MITRE-driven detection engineering don’t catch:

A renamed powershell.exe binary copied to a benign filename like update.exe, where every command runs identically but rules keyed on the executable name see nothing.
Heavily obfuscated scripts that use string concatenation and character substitution so suspicious keywords never appear in the actual command text, defeating rules keyed on flagged strings or encoded command flags.
Custom .NET binaries that load the PowerShell engine as a library and execute commands inside their own process. This is the pattern behind Cobalt Strike’s powerpick and Sliver’s execute-assembly, where powershell.exe never runs at all.

Each of these is a specific, concrete thing real adversaries do. Not theoretical evasion that hasn’t been seen in the wild. The coverage map cannot distinguish a rule that catches a script kiddie running Invoke-Mimikatz from a rule that detects obfuscated staging via reflectively loaded .NET assemblies through WMI. Both turn the cell green. The metric answers an existence question, not a quality one.

The Relevance Problem

The depth problem would be bad enough on its own. But coverage maps have a second failure mode: they treat every technique as equally important.

Consider T1011, Exfiltration Over Other Network Medium. The technique is real. ATT&CK has actively maintained it since 2017 with a formal detection strategy. The sole documented procedure example, across the parent technique and its only subtechnique, is Flame, the 2012 espionage tool with a Bluetooth module called BeetleJuice. One piece of malware. Fourteen years ago.

For the overwhelming majority of organizations, the technique is irrelevant. The mid-market healthcare company, the regional bank, the SaaS startup: their threat actors are ransomware operators running Cobalt Strike through phishing, moving laterally with stolen credentials, encrypting file shares. Nobody in that threat model is exfiltrating over Bluetooth. The technique exists on the matrix because ATT&CK catalogs adversary behavior comprehensively, not because it represents meaningful risk to every organization.

But the gap is visible. The cell is red. The coverage percentage drops because of it. So someone writes a rule. The rule may even be technically sound. The problem is that every hour spent writing, tuning, and triaging alerts from that detection is an hour not spent on the techniques the organization’s actual threat actors use. In MSSPs, where time and headcount are the binding constraint, this tradeoff is the whole game.

The end result: an organization with 200 shallow rules spread across the matrix looks “better” on the heatmap than an organization with 40 deep, adversarially validated rules concentrated where the threats actually are. The first appears more secure. The second is actually more secure. The metric cannot tell the difference.

The Independence Problem

The third failure mode is rarely discussed. Suppose your environment shows five rules covering T1059.001. The heatmap implies defense in depth, five independent chances to catch the same technique. Now look at the rules. They all key on process creation events from the same EDR agent. They all depend on the same telemetry stream. The five rules are not five rules. They are one rule wearing five hats.

This is filter-survivor bias. The output of a hidden filter is read as authentic signal. What looks like multiple independent observations is one selection mechanism producing many apparent ones. It is five rules with a single point of failure. When the logging breaks, gets disabled, or is tampered with by the adversary as part of their tradecraft, all five “independent” detections collapse simultaneously. The coverage map shows no change. The cells are still green. The defense is gone.

What Would Actually Work

ATT&CK as a taxonomy for adversary behavior is valuable, and there is no serious alternative as a shared language. The problem is using a taxonomy as an evidence structure. A taxonomy tells you what adversaries can do. An evidence structure tells you whether you can stop them. A real evidence structure would have to do three things the coverage map cannot:

Weight techniques by relevance to the specific environment. Not treat every cell as equally worth filling. This means integrating other signals: what log sources are being ingested and what can actually be detected from them, CTI on the threat actors that target this organization, and evasion data on what defeats existing rules.
Evaluate detections adversarially. Ask what defeats a rule, not just whether the rule exists. The depth problem is a depth problem because nobody is asking the second question.
Model dependencies between detections. Treat correlated signals as one signal, not five. Five rules keyed on the same telemetry stream are a single point of failure dressed up as defense in depth.

None of this is conceptually new. The reason it has not replaced coverage maps is that doing it manually requires a dedicated threat modeler, an adversarial validation process, and a telemetry dependency analysis for every client, every quarter. At MSSP scale, across hundreds or thousands of environments, that has been economically impossible.

I no longer think it is. But that is a different post.

Comments

Ready for more?