Seeing Through the Vendor Spin: Interpreting the MITRE ATT&CK Evaluation Results

October 4, 2023

The 2023 MITRE ATT&CK Enterprise Evaluation results were just released and that means one thing – all participating vendors are scrambling to show themselves in the best light possible. This, unfortunately, leads to many underperforming vendors to spin their results in a way that makes it appear they did well, when the actual results suggest otherwise. The vendor community has gotten so good at spinning that it’s difficult for those not steeped in the MITRE ATT&CK Evaluation process to see through the distortions.

Below we’ll discuss the major MITRE ATT&CK Evaluation categories, how they’re measured, and what to look for when evaluating vendor performance. Then you’ll be able to go to participants websites and understand exactly what they are (and aren’t) saying.

MITRE ATT&CK Basics

MITRE developed the MITRE ATT&CK Framework, the generic process of 14 broad tactics used by attackers to accomplish their objectives. Real-life attacks could include any number of the 14 tactics. Each tactic contains multiple techniques that describe the actual activity carried out by the adversary to accomplish the objective of the tactic.

Following is MITRE’s representation of the 14 tactics across the top, with the major techniques employed for each tactic below. Note that some techniques have sub-techniques that represent the steps that may be used to carry out the technique.

The MITRE ATT&CK Enterprise Evaluation

Each year MITRE Engenuity, the tech foundation of MITRE, emulates an attack by simulating the tactics and techniques that have been used by a known threat actor. The simulated attack sequence is made up of multiple Steps, each of which generally represent a Tactic in the MITRE ATT&CK Framework. This year, the attack sequence was made up of 19 Steps, with some Tactics used multiple times. For example, this year 6 of the 19 Steps were some form of the tactic “Lateral Movement.”

To help illustrate this, here are the 10 Steps used in Day 1 testing of a multilayer campaign targeting both Windows and Linux. Each of these Steps represents a Tactic.

Initial Compromise
Initial Access
Discovery & Privilege Escalation
Persistence
Lateral Move to Domain Controller
Credential Access
Discovery
Credential Access
Lateral Move to Linux
Watering Hole

Steps vs. Sub-steps

The MITRE ATT&CK Evaluation is broken into several Steps, each of which generally emulates a Tactic in the MITRE ATT&CK Framework.

For each Step, MITRE uses multiple Sub-steps, which generally emulate a Technique in the MITRE ATT&CK Framework.

In the 2023 Evaluation, for example, MITRE used 19 Steps that were broken into 143 Sub-steps. Testing for the MITRE ATT&CK Evaluation is done over 4 days. We will cover the tests in more detail below.

Day 1 – Evaluate the ability to detect and classify threats for scenario one.
Day 2 – Evaluate the ability to detect and classify threats for scenario two.
Day 3 – Evaluate anything missed on Days 1 & 2 and test with Configuration Changes.
Day 4 – Evaluate Protections.

Detection vs. Visibility

Let’s break down these two terms that cause significant confusion when discussing MITRE results. It’s impossible to correctly analyze the MITRE results without understanding these terms.

Detection

Interestingly, when discussing MITRE ATT&CK Evaluation results, people generally use the term Detection when measuring the number of Steps detected. A Step is considered Detected if one or more Sub-steps within the Step were detected. For example, if a Step is made up of 9 Sub-steps and 1 of the 9 Sub-steps are Detected while 8 of the 9 Sub-steps are missed, that Step is considered to be successfully Detected. As long as a vendor Detects one Sub-step within the Step, they consider it to be a successful Detection of the Step.

An extreme example would be a vendor Detecting only 1 of the several Sub-step in each of the 19 Steps claiming to have achieved 100% Detection. With 143 Sub-steps used this year, that means the vendor missed 124 Sub-steps (missed 87% of the threats) but still boasted 100% Detection. Overall, the Detection measurement represents the low bar of the MITRE ATT&CK Evaluation.

Visibility

Visibility generally refers to the total number of Sub-steps detected across all the Steps. The calculation tends to be a bit more straightforward as it’s simply the number of Sub-steps detected out of the total number of Sub-steps. Sometimes vendors will mix the terms Detection and Visibility, so you have to dig in a bit to establish which definition they’re actually referring to. A vendor that exhibited poor Visibility will almost always focus on Detection, a much lower hurdle to pass.

Many vendors that have at least one detection for every Step but have lower detection levels across Sub-steps will claim 100% Detection, when their Visibility performance (percentage of Sub-steps detected) may be quite low. One vendor, for example, with 67% Visibility in this year’s Evaluation blogged that they achieved 100% Detection. They didn’t mention that they missed one third of all Sub-step threats.

We can now see how 100% Visibility is far and away a more impressive result than 100% Detection. 100% Visibility signifies 100% Detection, but 100% Detection gives no indication of Visibility performance.

Analytic Coverage

The ability to detect Sub-steps without delays or configuration changes is crucial for endpoint protection platforms. The quality of each detection is also an important measure of the platforms ability to provide useful context to security analysts when investigating an alert but are also indicative of real threats vs false positive alerts. The more useful information that can be provided with an alert, the better.

MITRE assigns one of the following detection categories to each Sub-step detection.

Technique – gives information on how the action was performed or helps answer the question “what was done, why it was done and how it was done”
Tactic – gives information on the potential intent of the activity or helps answer the question “what was done and why it done”
General – gives information specifying that malicious/abnormal event occurred, with no context on “why” or “how”, “what was done”
Telemetry – gives information related to the events that occurred, without specifying that malicious activity took place or giving context
None – no data was presented with regards to the actions performed. Missed detection

Detections classified as General, Tactic, or Technique are grouped under the definition of “Analytic Coverage,” which is a measure of the EDR tool’s ability to provide actionable threat detections. The more Sub-steps detected with Analytic Coverage, the better.

Huge Caveat – Configuration Changes and Delayed Detections

Here are two additional confusing elements of the MITRE ATT&CK Evaluation that must be understood to fairly evaluate a vendor’s performance.

Configuration Changes

After the scenarios are fully tested, MITRE allows vendors to make changes to their systems and retest the entire attack sequence. That is, after the vendor knows they missed detections, they can extensively alter their system settings and attempt to detect the known missed threats. While this approach makes no sense in real world scenarios, it does concede that the testing team may have made configuration mistakes and allows the team to rectify the errors. Every security practitioner would like to buy the time machine that could take them back to the time when a dangerous threat was missed for a redo.

It’s imperative to know whether a vendor’s results occurred with or without configuration changes. This year, one vendor’s Analytic Coverage result went from 78% to 98% after configuration changes, while another went from 96% to 100%. Similarly, one vendor’s Visibility result went from 85% to 100% after configuration changes. All these vendors laud their higher score but never mention configuration changes.

Delayed Detections

During testing, detections can occur in real time (or near-real time), or they can be delayed minutes or hours after the malicious activity occurs. A delayed detection modifier is noted if the detection is not observed in a timely manner. This generally occurs when the endpoint agent itself cannot detect a threat based on the data presented. After the backend system collects additional data, it can determine that a malicious activity occurred.

Unlike real-time alerts, which result from the platform making decisions in real time, delayed alerts can take minutes to hours. In the real world, real-time alerting is critical. For example, if a vendor detects a Ransomware in a delayed manner and not in real time, the infected machine can be encrypted before the vendor presents a detection. Delayed alerts mean no alerts. No alerts mean no protection. This is why it’s always better to have automatic, real-time alerts for malicious activity.

Protections

MITRE also offers vendors the opportunity to participate in Protection scenarios that are compromised of a subset of the attack sequences used during the detection assessment. The Protection evaluation is a short test that occurs on the last testing day. This year, Protection testing was broken out into 13 attack sequences, each consisting of multiple attack steps for a total of 129 steps. One of the sequences consisted of only 2 steps while others consisted of over a dozen steps. Results include whether any of the steps were prevented in each sequence and how early in the sequence the prevention occurred.

For the Protection testing, preventing an early step in each sequence means that subsequent steps are not run as the attack sequence is considered to be prevented. This approach does not test how many of the 129 steps can be blocked individually, but how many of the 13 sequences were blocked anywhere in the sequence. This means a vendor could miss the first 14 steps in the sequence, then block the 15^th and final step to have the sequence be considered prevented. Or, a vendor could block the first step and perhaps would have missed subsequent steps in the sequence, but that is not tested. 100% Protection means one of the steps in each attack sequence was blocked.

Protection results do not consider any contextual information as does the Detection portion of the Evaluation. If a threat is blocked, the vendor may or may not include information on the threat itself, but this is not included in the test results. Protection results are important but given the very short time spent on this segment and testing limitations, it’s not nearly as important as Visibility and Analytic Coverage.

Final Thoughts

As you peruse the creative vendor charts, many of which concocted in some confusing format solely meant to put the vendor’s results in a good light, keep the considerations above in mind.

By far, the important MITRE ATT&CK Evaluation measures are:

Visibility (Sub-steps detected) before Configuration Changes and without Delays
Analytic Coverage (Technique, Tactic or General detections) before Configuration Changes and without Delays

The MITRE ATT&CK Evaluation is a valuable resource that can be used to inform your decision when selecting a security vendor. A top-performing MITRE ATT&CK Evaluation indicates a vendor whose solution will likely perform well in detecting real-world threats. It’s important to understand how to interpret the MITRE ATT&CK Evaluation results so you can correctly assess each vendor’s capabilities – emphasizing measurements that matter and discounting those that don’t.

Follow this link to view the full analysis of the MITRE Engenuity ATT&CK® Evaluations: Enterprise – 2023 Turla Edition.