Blog

On Risk

By DarkHorse on Dec. 23, 2024

Posts:

Why Managed Triage Is A Poor Economic Proposition
(Feb. 20, 2025)

On Risk
(Dec. 23, 2024)

The Seven Tiers of Vulnerability Identification
(Dec. 17, 2024)

On Fractional Testing
(Sept. 14, 2024)

What If Things Didn't Have To Be This Way?
(Sept. 14, 2024)

Risk comes from not knowing what you're doing.

– Warren Buffett

How do you think about determining and prioritizing Risk (with a capital ‘R’) when it comes to vulnerabilities?

Said differently: how do you go about ranking any given security finding above (or below) another?

How one makes that determination is highly consequential - a mis-prioritized vulnerability can be just as disastrous as not knowing about the issue in the first place. If you know about a risk, but it’s mis-characterized as a low priority issue, then our limited remediation resources will likely go towards fixing other, higher priority issues. However, if the issue later gets exploited and it comes to light that it was actually a high risk, that mis-prioritization can lead to some less-than-pleasant outcomes.

So, while it is first essential that we know about the risks (we did a long writeup on the seven different ways that vulnerabilities are commonly identified here), it is similarly essential to make sure those risks are properly prioritized - so that those with the highest Risk to the business are taken care of first.

The three main schools of thought in the security platform space around prioritization right now appear to be:

Scoring risk on a flat plane by technical severity (used by Bugcrowd, scanning tools, many consultative pentest providers).
Using CVSS, which as of version 4.0, has 8 (or 11, depending on how you count them) scoring inputs (used by HackerOne, Intigriti, others).
The OWASP Risk Rating Methodology of “likelihood + impact = Risk” (used by Cobalt, others).

Let’s quickly review each of the above approaches in a little more detail, and give some thought to their respective approaches.

A flat plane, based on technical severity (VRT, etc).

While this may seem like a simple and efficient solution up front, upon further examination, evaluating vulnerabilities just by their technical severity is lacking in one extremely crucial way: context.

For instance, what if you have two critical issues (e.g. P1s), but one is extremely easy to exploit, and the other isn’t? With a flat prioritization model, you end up with two vulnerabilities that have the exact same score, despite the fact that one very clearly poses a larger risk to the organization. In a flat rating system, that nuance isn’t represented, and creates the risk that the lower Risk issue gets fixed before the one that presents a higher Risk.

Additionally, in a flat plane model a P1 located an a business critical asset is also rated the same as another P1 on an asset that has little to no importance to the business. Again, in a model like this, there’s no clear way to represent the importance of the finding relative to where it’s located.

While any prioritization is better than none, it’s clear that a flat model has a number of shortcomings that aren’t easily addressed, except by arbitrarily modifying the priority to fit where one feels like it should be (e.g. saying “eh, we’ll rank this P1 as a P2, since it’s not as bad as this other P1”) - which can create other issues downstream due to its subjectivity. Arbitrarily adjusting issues sounds like it would work in principle, but unless there’s a clear rubric for doing so, one person’s downgrade may not seem like a downgrade to another, and so on - leaked to inconsistency, which can then lead to mis-prioritization - which is precisely what we want and need to avoid.

So, it’s clear that in order to have an informed Risk rating we need more than a flat scoring plane. We need something that can account for both the exploitability of the issue, as well as business context. What about CVSS? It’s got lots of inputs that should help with the nuance…

Using CVSS to prioritize Risk.

CVSS expands on the flat plane model and introduces the concept of giving the vulnerability a higher score based on the likelihood of exploitation / attack complexity (e.g. does it require privileges, user interaction, etc). This additional piece of context goes a long way in terms of helping determine which issues pose a greater risk to the business. For instance, all else being equal, a finding that requires a user to click on a link is less likely to occur than one that requires no user interaction, etc. CVSS allows us to input these values, and get a more a more nuanced output for prioritization.

This is a fantastic step forward in terms of prioritizing Risk more accurately than using a flat plane. Is this perhaps our solution for prioritizing Risk? It’s (1) got more context, and (2) useful variables that help determine the technical severity of the issue.

However, as with the flat plane model, CVSS score is lacking in business context. So, while CVSS can be useful for scoring the severity of an issue (including the likelihood of exploitation), it doesn’t get to the heart of prioritizing Risk for the business - which is our ultimate goal. Yes, it is extremely useful to know the technical severity with characterization around the likelihood of exploitation - and knowing both of those things gets us pretty close to prioritizing Risk effectively, but we still need to understand business context. Suppose we have two separate findings with the exact same CVSS score on two different business systems, which do we fix first? Without business context, they both are rated and ranked the same.

Secondary to missing business context, another nuance of CVSS and something I’ve witnessed firsthand, is that there is often fair amount of subjectivity in CVSS ratings. Paradoxically, all of the different inputs when scoring CVSS can also often lead to disagreements.

For instance, say we have a stored cross site scripting issue that breaks the main page and makes an application unusable for all users - effectively blocking all other users from accessing the content. When scoring for CVSS, do we say the issue affects availability at the high or low level? It certainly does affect availability, so it's at least "low" on the availability, but it doesn’t completely take the system offline, so maybe it's not a high. Except that it does render the service unusable, so that seems like it meets the requirement of a "high". But again, the server itself is completely unaffected - it technically isn't offline. I’ve seen this very issue argued both ways - and someone usually ends upset, no matter which way the final ruling goes.

Anyone that has used CVSS has likely been part of a conversation where there’s been some level of disagreement on where something should fall in relation to one of the variables. It may feel like this is only in edge cases, but I’ve argued for and against on both sides of the table enough times to know that issues like this are (1) anything but uncommon; and (2) never as simple as we’d like them to be. That said, any model is likely to have disagreements over where things should be ranked, no matter how robust.

After looking into CVSS, it does seem like we’re getting closer to a more complete solution. We've now got a more nuanced approach to rating the findings themselves... Now, what if there was an option that took business impact into account?

The OWASP Risk Rating Methodology.

At first glance, the OWASP Risk Rating Methodology seems to be heading in the right direction - their simplified model of “Risk = likelihood + impact” has more nuance than a flat rating plane, and depending on what’s included in “impact”, this model might just cover business context.

Beyond the simplified initial definition, the OWASP model has a number of sub-items that go into each piece of the equation. For instance, when rating likelihood, one has to take into account:

Threat agent factors, including: skill level, motive, opportunity, and size. Vulnerability factors including: ease of discovery, ease of exploit, awareness, and intrusion detection. These are all rated 0-9, with the value being the average, and then all values are combined to create the likelihood.

Then there are an additional 4 factors for business impact and another 4 for technical impact, which combine to represent "impact" as a whole. In total, there are 16 factors that are used to create a final Risk score (0-9) - which is then translated into an informational, low, medium, high, or critical rating. You can try out a calculator for this approach here.

With all these different variables to calculate, this is beginning to feel a lot like CVSS. Except the difference between the two models is that the OWASP model adds extra factors into determining likelihood, as well as incorporating the ever-elusive business impact. Of particular note, relative to CVSS, it's our view that the 0-9 scale for each item helps with avoiding subjectivity (vs. just having high vs. low) and also comes with some very helpful waypoints for scoring - such as when measuring the privacy impact of a finding, a 3 means a single individual is affected, 5 is hundreds, 7 is thousands, and 9 is millions. This gives a much more objective basis to operate from when scoring findings (though again, subjectivity can absolutely come into play).

This model feels like it’s making a lot of sense, and seems to bring a more complete picture as it relates to prioritizing Risk. However, if you've got a feeling that based on the way things have gone so far, we’re probably going to find some issues with this model too, well, you’d be right.

To start, there has been some debate around this model and the threat actor skill component (part of determining the likelihood of exploitation), which is fair - because at some level, both ease of exploitation and ease of discovery are effectively measuring loosely the same thing as a threat actor’s skill level. Yes, they’re not the same thing, but fundamentally, does it matter if the threat actors are unskilled if the vulnerability is easy to find and exploit? Additionally, it’s not necessarily possible to always (or ever) know the skill level or the motive of the threat actors. In our view, it’s best to not assume anything specific about adversaries, except that they are capable. There could be highly motivated and highly skilled threat actors coming for you right now, and you wouldn’t know it. For that reason, we can’t assume things like their motive, skill level, or their potential reward.

In our view the likelihood section can be distilled to: (1) who has access; and (2) how easy is it to find and exploit the vulnerability. Of course there are exceptions, but as a rule, it's generally true that an easy-to-find-and-exploit vulnerability on the public internet is far more likely to be exploited than one on a private intranet with 20 people who can get to it. These two factors account for 95% of what we need to know around how likely something is to be exploited. Again, we’re not saying the other information isn’t necessarily useful, it’s just not useful to the same degree. Which takes us to the next point.

We’re also not convinced that all the items within this model should be weighted at the same level. Take for instance “loss of accountability” (e.g. being able to know who was responsible for the attack). This variable is weighted exactly the same as the loss of data. Which is to say that if you could track who did the attack, this model would seem to infer that knowledge alone would cut the impact of a complete data loss incident in half or more. Yes, all things being equal, a vulnerability where there’s no trail of accountability is more serious than one where there is, but it doesn’t seem reasonable that one could have a full trail of accountability, and that offsets a maximal loss of confidentiality to a fairly high degree.

In the same vein, it doesn’t quite make sense that one could have a complete loss of confidentiality (9), but no loss of integrity, accountability, or availability (0s), and the technical score would be a 2.25 (low). This seems like an incomplete picture for something that would otherwise be devastating in impact. Again, all things being equal, a vulnerability that did all 4 should be ranked higher than one that does one, but that doesn’t mean this way of calculating risk is correct.

As a final point on scoring in this methodology, it’s entirely possible for one to have a highly exploitable issue that has zero impact, and the finding would still receive a “medium” rating somehow. By way of example, in this paradigm something with a low technical severity (say, self-xss), but easy exploitability could get the exact same final ranking as less exploitable, but extremely impactful vulnerability.

So, maybe this model isn’t exactly what we’ve been looking for. But it does help advance the conversation by quite a bit by incorporating business impact and context.

To recap as it relates to prioritizing Risk:

A flat plane for technical severity doesn’t have enough nuance or take into account the crucial aspect of business context.
CVSS also doesn’t include business context and has too much room for subjectivity. CVSS is still good for rating the severity of a vulnerability in a vacuum, but business context is essential for making effective risk prioritization decisions.
The OWASP risk rating methodology has some flaws around scoring, but does a good job of (1) having a simplified version (Risk = likelihood + impact); and (2) having a less subjective way of measuring business impact.

So, if complexity doesn’t inherently bring precision, what if we went the other way and made things as simple as they can be, while retaining contextual awareness?

What if simple was:

Risk = technical severity + context Context = likelihood of exploitation + importance of system

Is this less complex than the other models? Absolutely.

Does it still allow for contextual awareness? Yes. And paradoxically, by being contextually aware, this simpler system can be more accurate and actionable than more complex models.

By way of example, say you have three, easy-to-find-and-exploit stored cross site scripting vulnerabilities on three distinct assets that have varying levels of importance to the business (one asset is critical, another is low, and the final is in the middle). Which do you fix first?

With CVSS or a flat plane model, you’d end up with the same score for all three vulnerabilities, so that’s not much help in determining which to fix first. You could add some extra tagging to further prioritize things, but adds subjectivity as well as complexity to the conversation.

With the OWASP model, you can add inputs around contextual awareness such as how many users would be affected, as well as the business impact of exploitation, etc. This helps move things in the right direction - however, as highlighted above, the issues around scoring still exist that can impair usability (e.g. how modifying the impact values does relatively little when the likelihood is cranked all the way up). Additionally, as discussed previously, there are questions around unknowable items such as motive, rewards, etc.

But if we take a step back from the more complex models and go higher level, we can score the technical severity of the issue as the same across all three, but then by also scoring context as likelihood + importance, we get three distinct results that prioritize the findings relative to their importance to the business. Despite having the same technical severity and ease of exploitation, the vulnerability on the most important business asset is prioritized above the others, as it should be.

This simplified model works remarkably well and consistently so. There are surely places for more complex models, but the DarkHorse view always defaults to simplicity, and this is the simplest we could get while maintaining essential business context. Some core ideas around this model are:

Since we start with technical severity, this model is never less actionable than using a flat scoring plane.
Since this model includes business prioritization, it is also never less actionable than using a system that includes technical severity and exploitability alone (i.e. CVSS).
And since it doesn’t include extraneous variables, it takes less time and has less subjectivity than a complex matrix.

The process for arriving at this simplified Risk value is:

Take the technical severity (0-5; where 0 is extremely critical and 5 is informational)
Apply the lens of context to it (1-3). In this case, context is simply the likelihood of exploitation + importance of the asset.
The outcome is our Risk score between 0 (extremely critical) and 6 (below informational). See the tables on https://darkhorse.sh/definitions for an easy-to-reference model.

With three simple values, we can arrive at a Risk score that is easier to get to, and highly actionable, which allows for quicker and more effective prioritization, which is what ultimately matters.

Is it perfect? Sadly, no. We can poke holes in this methodology too - for instance, what about when there are hundreds of similarly scored findings - e.g. in the above example, if all three issues have the exact same final Risk score, now which one do you fix first? Etc. One could go near-infinite in terms of how many layers there could be in a model like this, and we debated including more for a while, but ultimately felt that three was enough to make things actionable in 90%+ of situations.

Going too complex would just add more noise - so if we can cover the vast majority of use cases, that’s good enough to be actionable, which is the whole point of prioritizing risks - to make sure we take care of the first things first. No model can ever account for everything, so good enough is good enough.

We spent a lot of time on this, and in our view it is the simplest way to get to a complete methodology for prioritizing Risk. It’s possible there may be a better solution, but for now it’s the best we’ve got until we figure out something better. It's worth further calling out that this model wouldn’t exist if it weren’t for the other models to learn and grow from, and our next model will learn and grow from the learnings from applying this model.

And so this is how we at DarkHorse presently view Risk.

Accurate and actionable vulnerability ratings are essential for organizations to be able to quickly characterize and assess the risk a given finding poses for an organization, and this approach allows for that much-needed nuance, while also keeping it extremely simple and easy to understand.

How do you think about Risk? We’d love to hear your thoughts - both on how you look at it, as well as on our approach.

To learn more about DarkHorse, go to https://darkhorse.sh. Thanks!