February 14, 2023

Measure what matters: Why MTTR is an Incomplete Cybersecurity Metric and What can you do about it

“The line between disorder and order lies in logistics …” Sun Tzu

 

“The line between disorder and order lies in data driven analytics …” modern cybersecurity

One of the most important weapons of Alexander the Great was analytics and logistics. He gathered intelligence on the enemy’s weapons, supply sources, food stores, strategy, tactics, and performance in previous battles to understand their strengths and weaknesses. He used this information to strategize the number of soldiers, the formation, positioning, and supplies needed to win. A prime example is the Battle of Gaugamela in 331 BC, where Alexander faced Darius III of Persia. Darius had a million soldiers, 15 war elephants and 200 scythe-bladed chariots  against Alexander’s army of just 40,000 soldiers. Despite Darius’ might, Alexander’s data-driven strategy allowed him to overcome the odds. For instance, as the Persian chariots charged, Alexander’s army opened lanes for them to pass through, making it easier for his archers to destroy them.

Metrics and modern cybersecurity are inherently linked. CISOs use metrics to determine priorities, inform decisions, support investments, track progress and maintain accountability.  If incorrect metrics are chosen (like the numerical strength that Darius relied on in Gaugamela), it can give a false sense of progress and will lead to a weakening of the organization’s overall cybersecurity posture.

In this blog, I delve into the importance of using metrics and look into the use of a popular metric, Mean Time To Remediate (MTTR); sometimes the only operational metric used to measure the efficacy of vulnerability management. Further, we will deep dive into a metric Balbix introduced known as Mean Open Vulnerability Age (more about it later).

The curious case of MTTR as the be-all vulnerability management metric

Security teams must identify and address threats as soon as feasible in order to safeguard their organization against cyberattacks and data breaches. The difference between a minor attack and a disastrous data breach often lies in how quickly vulnerabilities are remediated. The traditional metric used to measure this is Mean Time to Remediate (MTTR). MTTR is considered to be one of the key cybersecurity metrics. Security teams are constantly pursuing to drive down this number.

And while we agree that measuring MTTR is certainly important, our work with our customers has uncovered a truth that isn’t widely known by many in the industry.

Relying on MTTR alone is a flawed approach

Yes, you heard it right! But before we get to the root of this assertion, let’s get past the fundamentals.

MTTR is the average time taken by security teams to establish a response and remediate a detected vulnerability or a threat. In simple terms, MTTR measures the period between the time when the vulnerability was found in an asset and the time when it was remediated. This data is then aggregated to calculate the mean time it takes to fix all discovered vulnerabilities in a given environment.

Now, why do we say that relying on MTTR alone is a flawed approach?

To answer this, let’s think of a situation where a security team is dealing with a huge backlog of vulnerabilities. Hypothetically, no new vulnerabilities have been remediated during the last year. The organization wants to improve the status quo and hires John as a vulnerability manager. Consider the following alternative scenarios for how John possibly approaches his role:

Scenario 1: Remediate the oldest vulnerabilities first

Being new to the organization, John wants to earn some quick wins and reduce overall cybersecurity risk. He looks at the vulnerabilities list and decides to take immediate action to remediate the oldest vulnerabilities. These have been open for one year. This action will drive the overall  MTTR up. Why would this happen? It’s because fixing the year-old vulnerabilities will drive the total time taken to remediate vulnerabilities up by 365 days for all the older vulnerabilities that John fixed. John’s intentions were spot-on but with the MTTR going up, he now faces a lot of uncomfortable questions from his leadership about the organization’s cyber security status.

Scenario 2: Remediate only the newest vulnerabilities

Alternatively, John decides that he should start by remediating only the new vulnerabilities, the ones that were found in the last 30 days. He postpones remediating the oldest ones. In just a few days, his team works hard to remediate the new vulnerabilities. This action will drive the MTTR down. John and his team may receive appreciation from leadership but deep down he knows that there is a huge vulnerability backlog. This situation will drive cyber risk up as the attackers can exploit any of the open vulnerabilities. The longer a vulnerability is open, the more likely it is to be exploited.

Scenario 3: Remediate only the recent critical vulnerabilities

John may decide to remediate only recent critical vulnerabilities and choose to not remediate any other vulnerabilities. This strategy is typically followed in war-time situations where the attention turns to urgently identifying and deploying patches or quick mitigations to fix critical vulnerabilities before attackers can cause serious damage to the organization. This approach will drive the MTTR down because only a short time elapsed between remediated date and found date. This approach will help contain the war-time risk of in-the-wild vulnerabilities but the overall cyber risk will remain high because, like Scenario 2, there is still a huge backlog of open vulnerabilities to be remediated.

To augment these scenarios, let’s consider a real-life example from one of our Fortune 500 customers:

1. During a given month, the MTTR moved in a range between 5 to 24 days. This relatively low MTTR gave the security team the impression that it had their cyber risk well under control.

 

Balbix dashboard showing MTTR from one of Balbix’s customers
Balbix dashboard showing MTTR from one of Balbix’s customers

 

2. But the benefit of a low MTTR wasn’t reflected in their Balbix risk dashboard. It showed a rather grim reality: breach risk in excess of $250M, which was way beyond the customer’s risk threshold. What was really happening here?

 

Balbix’s risk dashboard showing cyber risk quantified in dollars
Balbix’s risk dashboard showing cyber risk quantified in dollars

 

3. A quick analysis of their Balbix dashboard revealed the hidden truths.

  • The security team was doing phenomenally well with remediating high-risk vulnerabilities like zero-day and critical vulnerabilities (P1, as shown in the image below).
  • However, at the same time, the security team was leaving large volumes of low-risk vulnerabilities unaddressed.
  • Said another way, their war-time vulnerability management strategy was right on target. But they were falling behind in peace-time vulnerability management, which requires the security teams to focus on burning down high volumes of important vulnerabilities and closing security gaps quickly and efficiently.

 

Customer’s vulnerability remediation dashboard showing a large volume of unaddressed low risk (P2, P3, P4) vulnerabilities
Customer’s vulnerability remediation dashboard showing a large volume of unaddressed low-risk (P2, P3, P4) vulnerabilities

4. The result is that their approach drove MTTR to world-class levels but a long list of unaddressed vulnerabilities contributed to cyber risk rising to an alarming amount.

These scenarios bring to the fore the lack of correlation between MTTR and cyber risk. Organizations relying on MTTR alone as a way of tracking their cyber risk may end up incentivizing undesirable behavior.  Teams focus on driving the measured metric (MTTR) down, while permitting actual cyber risk to spike. It also conveys a false sense of the true effectiveness of an organization’s vulnerability management program.

If using MTTR alone is flawed, then what is the recommended approach?

To address the above-stated flaws, Balbix recently introduced a new metric in our platform called MOVA (Mean Open Vulnerability Age).

What is MOVA?

MOVA measures the mean (average) duration that a vulnerability is open (i.e. has not been remediated) from the time it was first detected. MOVA is usually measured in a unit of time, say days.

To understand the importance of the MOVA, let’s take a step back and draw inspiration from the world of retail. Retailers rely a lot on metrics to measure the health of their business. One such metric is Stock turn. Stock turn is the number of times the stock is sold or used in a given time period. As a retailer, if your stock turn is too low, then it means you’re not selling your inventory fast enough. If your stock turn is too fast, you’re probably not ordering enough.

Like stock turn, security teams need to track the rate of incoming vulnerabilities vs the rate at which those vulnerabilities are getting resolved. This is what a MOVA metric helps you track.

If MOVA is low, it means that you are burning your incoming vulnerability backlog fast.

If MOVA is high, it means that the vulnerabilities are not being addressed in a timely manner.

Now let’s take another look at the scenarios described earlier after including MOVA in the mix-

  1. If you remediate your oldest vulnerabilities first:
    • MTTR will immediately trend up and then start to go down over a period of time. MOVA will show a downward trend.
  2. If you remediate only the newest vulnerabilities:
    • MTTR will trend down but MOVA will be high and will continue to increase.
  3. If you are remediating only the recent critical vulnerabilities:
    • MTTR will trend down but MOVA will trend up with time.
  4. Finally, If you are remediating vulnerabilities efficiently:
    • MTTR and MOVA will both trend down with time and stay low, which means that you are remediating vulnerabilities quickly and you also do not have a large backlog.
The Balbix platform shows MTTR alongside MOVA to provide a comprehensive picture of vulnerability management performance
The Balbix platform shows MTTR alongside MOVA to provide a comprehensive picture of vulnerability management performance

We recommend that security teams track the MOVA metric alongside the MTTR metric for the following reasons:

  1. It facilitates effective communication of cyber risk.
  2. This combined with the MTTR provides a true picture of the vulnerability management effectiveness.
  3. MTTR along with MOVA should be computed and analyzed by asset priority, vulnerability severity, by software/hardware vendors, and products among others. This ensures that the efficiency is tracked and can be optimized according to business priorities and needs.

Putting it all together

Learning from these examples, you can think of the vulnerability management process as a sort of queuing system where there is a constant inflow and outflow of vulnerabilities. The key question that drives the process is-

‘What is the arrival rate and what is the departure rate of vulnerabilities?’

  • If the departure rate is much faster than the arrival rate: You are able to keep up with the incoming vulnerabilities. This is a great place to be in.
  • If the departure rate is closer to the arrival rate: You are barely able to keep up. Any surge in the arrival rate of new vulnerabilities is going to throw the team into chaos. This is not sustainable, and not a great place to be in.
  • If the departure rate is much slower than the arrival rate: There will be a continuous increase in risk. The behavior of the vulnerability teams will ultimately be limited to following a war-time strategy where they are fixing the critical vulnerabilities but are unable to do anything about overall risk. This situation is best avoided.

By leveraging MTTR and MOVA metrics along with data-rich role-based dashboards, Balbix helps security teams obtain the right information, at the right time. The additional visibility can allow you to considerably improve your vulnerability management program.

Start your journey to a more efficient vulnerability management program by scheduling a 30-minute demo with Balbix.