Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

How to write good risk scenarios and statements

Writing risk scenarios isn’t just paperwork—it’s the foundation of a great risk assessment. This guide breaks down how to build narratives that matter and turn them into crystal-clear risk statements that decision-makers can actually use.

build.jpeg

Risk management is both art and science. There is no better example of risk as an art form than risk scenario building and statement writing. Scenario building is the process of identifying the critical factors that contribute to an adverse event and crafting a narrative that succinctly describes the circumstances and consequences if it were to happen. The narrative is then further distilled into a single sentence, called a risk statement, that communicates the essential elements from the scenario. 

Think of this whole process as a set-up for a risk assessment as it defines the elements needed for the next steps: risk measurements, analysis, response, and communication. Scenario building is a crucial step in the risk management process because it clearly communicates to decision-makers how, where, and why adverse events can occur.

Fig. 1: Risk identification, risk scenarios, and risk statements

Fig. 1: Risk identification, risk scenarios, and risk statements

Risk scenarios and statements are written after risks are identified, as shown in Figure 1.


What is a risk scenario?

The concept of risk scenario building is present in one form or another in all major risk frameworks, including NIST Risk Management Framework (RMF), ISACA’s Risk IT, and COSO ERM. The above frameworks have one thing in common: the purpose of risk scenarios is to help decision-makers understand how adverse events can affect organizational strategy and objectives. The secondary function of risk scenario building, according to the above frameworks, is to set up the next stage of the risk assessment process: risk analysis. Scenarios set up risk analysis by clearly defining and decomposing the factors contributing to the frequency and the magnitude of adverse events.

See Figure 1 above for the components of a risk scenario.

Risk scenarios are most often written as narratives, describing in detail the asset at risk, who or what can act against the asset, their intent or motivation (if applicable), the circumstances and threat actor methods associated with the threat event, the effect on the company if/when it happens and when or how often the event might occur.

A well-crafted narrative helps the risk analyst scope and perform an analysis, ensuring the critical elements are included and irrelevant details are not. Additionally, it provides leadership with the information they need to understand, analyze, and interpret risk analysis results. For example, suppose a risk analysis reveals that the average annualized risk of a data center outage is $40m. The risk scenario will define an “outage,” which data centers are in scope, the duration required to be considered business-impacting, what the financial impacts are, and all relevant threat actors. The risk analysis results combined with the risk scenario start to paint a complete picture of the event and provoke the audience down the path to well-informed decisions.

For more information on risk scenarios and examples, read ISACA’s Risk IT Practitioner’s Guide and Risk IT Framework.


What is a risk statement?

It might not always be appropriate to use a 4-6 sentence narrative-style risk scenarios, such as in Board reports or an organizational risk register. The core elements of the forecasted adverse event are often distilled even further into a risk statement. 

Risk statements are a bite-sized description of risk that everyone from the C-suite to developers can read and get a clear idea of how an event can affect the organization if it were to occur.

Several different frameworks set a format for risk scenarios. For example, a previous ISACA article uses this format:

[Event that has an effect on objectives] caused by [cause/s] resulting in [consequence/s].

The OpenFAIR standard uses a similar format:

[Threat actor] impacts the [effect] of [asset] via (optional) [method].
Picture2.png

The OpenFAIR standard has a distinct advantage of using terms and concepts that are easily identifiable and measurable. Additionally, the risk scenario format from ISACA’s Risk IT was purpose-built to be compatible with OpenFAIR (along with other risk frameworks). The same terms and definitions used in OpenFAIR are also utilized in Risk IT.

The following factors are present in an OpenFAIR compatible risk statement:

  • Threat actor: Describes the individual or group that can act against an asset. A threat actor can be an individual internal to the organization, like an employee. It can also be external, such as a cybercriminal organization. The intent is usually defined here, for example, malicious, unintentional, or accidental actions. Force majeure events are also considered threat actors.

  • Asset: An asset is anything of value to the organization, tangible or intangible. For example, people, money, physical equipment, intellectual property, data, and reputation.

  • Effect: Typically, in technology risk, an adverse event can affect the confidentiality, integrity, availability, or privacy of an asset. The effect could extend beyond these into enterprise risk, operational risk, and other areas.

  • Method: If appropriate to the risk scenario, a method can also be defined. For example, if the risk analysis is specifically scoped to malicious hacking via SQL injection, SQL injection can be included as the method.


Risk Statement Examples

  • Privileged insider shares confidential customer data with competitors resulting in losses in competitive advantage.

  • Cybercriminals infect endpoints with ransomware encrypting files and locking workstations resulting in disruption of operations.

  • Cybercriminals copy confidential customer data and threaten to make it public unless a ransom is paid, resulting in response costs, reputation damage and potential litigation.


Conclusion

Scenario building is one of the most critical components of the risk assessment process as it defines the scope, depth, and breadth of the analysis. It also helps the analyst define and decompose various risk factors for the next phase: risk measurement. More importantly, it helps paint a clear picture of organizational risk for leadership and other key stakeholders. It is a critical step in the risk assessment process in both quantitative and qualitative risk methodologies.

Good risk scenario building is a skill and can take some time to truly master. Luckily, there are plenty of resources available to help both new entrants to the field and seasoned risk managers hone and improve their scenario-building skills.

Additional resources on risk identification and scenario building:


This article was previously published by ISACA on July 19, 2021. ©2021 ISACA. All rights reserved. Reposted with permission.


Get Practical Takes on Cyber Risk That Actually Help You Decide

Subscribe below to get new issues monthly—no spam, just signal.

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Optimizing Risk Response, Unfiltered

We’ve turned risk response into a one-trick pony—mitigate, mitigate, mitigate. This post argues for something smarter: using quant to weigh all your options and finally break free from the tyranny of the risk matrix.

Sisyphus (1548–49) by Titian

Sisyphus (1548–49) by Titian

I mentioned in a previous blog post that I just wrapped up two fairly large projects for ISACA: a whitepaper titled “Optimizing Risk Response” and a  companion webinar titled “Rethinking Risk Response.”

The whitepaper was peer-reviewed with an academic tone. After reviewing my notes one last time, I decided to write up a post capturing some of my thoughts on the topic and process, of course, unfiltered and a little saltier than a whitepaper.


Behind the Scenes

I’m a member of ISACA’s Risk Advisory Group - a group that advises on ISACA webinars, blogs, whitepapers, journal articles, projects, and other products on the broad topic of risk. When the opportunity came up to write a whitepaper on the subject of risk response, I jumped at the chance. It seemed like a boring old topic that’s been around since the first formal risk management frameworks. I knew I needed to find a unique angle and spin on the topic to make it engaging and give risk managers something new to consider.

First came the literature review. I read the risk response sections of all major risk frameworks from technology, cybersecurity, operational risk, enterprise risk management, and even a few from financial risk. I also read blogs, articles, and project docs that included risk response topics. I came out of the literature review with a book full of notes that I summarized into the following four ideas:

  • The topic of risk response is not settled, especially in technology/IT risk. “Settled” means both standards bodies and practitioners generally agree on what risk response is and how to use it.

  • Risk response is erroneously synonymous with risk mitigation. Risk frameworks don’t make this mistake, but organizational implementations and practitioners do.

  • Most risk response frameworks assume the adoption of qualitative risk techniques, which makes it challenging, sometimes impossible, to weigh the pros and cons of each option. This is probably why most practitioners default to mitigate. Qualitative methods do not allow for the discrete analysis of different response options strategically applied to risk.

  • Employing risk response can be fraught with unintended consequences, such as moral hazard, secondary risk, and cyber insurance policy gaps.

Ah, so the angle became crystal-clear to me. The central themes of the whitepaper are:

  • Focusing on risk mitigation as the sole response option is inefficient.

  • Evaluation of each risk response option is an integral part of the risk management process.

  • Risk response doesn’t exist in a vacuum. It’s all part of helping the organization achieve its strategic objectives, bounded by risk tolerance.

  • Risk quantification is the tool you need to achieve efficient and optimized risk response, including identifying and reacting to unintended consequences.

The themes above gave the whitepaper a fresh take on an old topic. I’m also hoping that the practical examples of using risk quantification to gain efficiencies help practitioners see it as a strategic tool and nudge them closer to it.

Why Risk Response ≠ Risk Mitigation

Reacting and responding to risk is an embedded and innate part of the human psyche. All animals have a “fight or flight” response, which can be thought of as risk mitigation or risk avoidance, respectively. The concept of risk transference started forming in the 1700’s BCE with the invention of bottomry, a type of shipping insurance. 

Abraham de Moivre, world’s first modern risk analyst

Abraham de Moivre, world’s first modern risk analyst

Abraham de Moivre, a French mathematician, changed the world in 1718 with a seemingly simple equation. He created the first definition of risk that paired the chances of something happening with potential losses.

“The Risk of losing any sum is the reverse of Expectation; and the true measure of it is, the product of the Sum adventured multiplied by the Probability of the Loss.” - Abraham de Moivre, The Doctrine of Chances (1718)

This evolved definition of risk changed the world and the way humans respond to it. Gut checks, “fight or flight,” and rudimentary forms of risk transference like bottomry were given the beginnings of an analytical framework, leading to better quality decisions. New industries were born. First, modern insurance and actuarial science (the first risk managers) sprung up at Lloyd’s of London. Many others followed. Modern risk management and analysis provided the ability to analyze response options and employ the best or a combination of the best options to further strategic objectives. 

All risk management at this time was quantitative, except it wasn’t called “quantitative risk.” It was just called “risk.” Abraham de Moivre used numbers in his risk calculation, not colors. Quantitative methods evolved throughout the centuries, adding Monte Carlo methods as one example, but de Moivre’s definition of risk is unchanged - even today. If you are interested in the history of risk and risk quantification, read the short essay by Peter L. Bernstein, “The New Religion of Risk Management.”

Something changed in the late 1980’s and 1990’s. Business management diverged from all other risk fields, seeking easier and quicker methods. Qualitative analysis (colors, adjectives, ordinal scales) via the risk matrix was introduced. The new generation of risk managers using these techniques lost the ability to analytically use all options available to strategically react to risk. The matrix allows a risk manager to rank risks on a list, but not much more (see my blog post, The Elephant in the Risk Governance Room). The resulting list is best equipped for mitigation; if you have a list of 20 ranked risks, you mitigate risk #1, then #2, and so on. This is the exact opposite of an efficient and optimized response to risk.

In other words, when all you have is a hammer, everything looks like a nail.

It’s worth noting that other risk fields did not diverge in the 1980’s and 1990’s and still use quantitative risk analysis. (It’s just called “risk analysis.”)


Two examples of an over-emphasis on mitigation 

The Wikipedia article on IT Risk Management (as of August 16, 2021) erroneously conflates risk mitigation with risk response. According to the article, the way an organization responds to risk is risk mitigation. 

Second, the OWASP Risk Rating methodology also makes the same logical error. According to OWASP, after risk is assessed, an organization will “decide what to fix” and in what order. 

To be fair, neither Wikipedia nor OWASP are risk management frameworks, but they are trusted and used by security professionals starting a risk program. 

There are many more examples, but the point is made. In practice, the default way to react to IT / cyber risk is to mitigate. It’s what we security professionals are programmed to do, but if we blindly do that, we’re potentially wasting resources. It’s certainly not a data-driven, analytical decision.


Where we’re heading

We’re in a time and age in which cybersecurity budgets are largely approved without thoughtful analysis, primarily due to fear. I believe the day will come when we will lose that final bit of trust the C-Suite has in us, and we’ll have to really perform forecasts, you know, with numbers, like operations, product, and finance folks already do. Decision-makers will insist on knowing how much risk a $10m project reduces, in numbers. I believe the catalyst will be an increase in cyberattacks like data breaches and ransomware, with a private sector largely unable to do anything about it. Lawsuits will start, alleging that companies using poor risk management techniques are not practicing due care to safeguard private information, critical infrastructure, etc.

I hope the whitepaper gives organizations new ideas on how to revive this old topic in risk management programs, and this unfiltered post explains why I think the subject is ripe for disruption. As usual, let me know in the comments below if you have feedback or questions. 

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

ISACA’s Risk Response Whitepaper Released

Most risk programs treat response as a checkbox and settle for surface-level metrics—but what if we aimed higher? This whitepaper explores how to break out of the mitigation hamster wheel and align risk response with strategy using quant-driven insights.

131333_Optimizing risk response WP_R2_to-FF.jpg

I recently wrapped up a true labor of love that occupied a bit of my free time in the late winter and early spring of 2021. The project is a peer-reviewed whitepaper I authored for ISACA, “Optimizing Risk Response,” released in July 2021.  Following the whitepaper, I conducted a  companion webinar titled “Rethinking Risk Response,” on July 29, 2021. Both are available at the links above to ISACA members. The whitepaper should be available in perpetuity, and the webinar will be archived on July 29, 2022.

The topic of risk response is admittedly old and has been around since the first technology, ERM, and IT Risk frameworks. Framework docs I read as part of my literature review all assumed qualitative risk analysis (e.g., red/yellow/green, high/medium/low, ordinal scales). Previous writings on the subject also guided the practitioner to pick one response option and move on to the monitoring phase.

In reality, risk response is much more complex. Furthermore, there’s much more quantitative risk analysis being performed than one would be led to believe by risk frameworks. Once I started pulling the subject apart, I found many ideas and opportunities to optimize risk response and management. I did my best to avoid rehashing the topic, instead focusing on the use of risk response to align with organizational strategy and identify inefficiencies.

I had two distinct audiences in mind while researching and writing the paper. 

First is the risk manager. I reflected on all the conversations I’ve had over the years with risk managers who feel like they’re on a hamster wheel of mitigation. An issue goes on the risk register, the analyst performs a risk analysis, assigns a color, finds a risk owner, then lastly documents the remediation plan. Repeat, repeat, repeat. There’s a better way, but it requires a shift in thinking. Risk management must be considered an essential part of company strategy and decision-making, not an issue prioritization tool. The whitepaper dives into this shift, with tangible examples on how to uplevel the entire conversation of organizational risk.

The second audience is the consumer of risk data: everyone from individual contributors to the Board and everyone in-between. In most risk programs, consumers of risk data are settling for breadcrumbs. In this whitepaper, my goal is to provide ideas and suggestions to help risk data consumers ask for more. 

If this sounds interesting to you, please download the whitepaper and watch the webinar. I strongly encourage you to join ISACA if you are not a member. ISACA has made a significant investment in the last several years in risk quantification. As a result, there are invaluable resources on the topic of risk, including a recent effort to produce whitepapers, webinars, books, and other products relating to cyber risk quantification (CRQ).

As usual, I am always interested in feedback or questions. Feel free to leave them in the comments below.

Resources:

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

SIRAcon 2021 Talk | Baby Steps: Easing your company into a quantitative cyber risk program

Curious about bringing quant into your risk program but overwhelmed by where to start? This talk breaks it down into practical, approachable stages—so you can move beyond heatmaps without losing your team (or your sanity).

Slide1.jpeg

I’m pleased to announce that my talk, “Baby Steps: Easing your company into a quantitative cyber risk program”, has been accepted to SIRAcon ‘21. SIRAcon is the annual conference for the Society of Information Risk Analysts (SIRA), a non-profit organization dedicated to advancing quantitative risk analysis. The talk is scheduled for Wednesday, August 4th, 2021 at 11:15 am Eastern US time.

In the talk, I’ll be sharing tips, techniques, successes, failures, and war stories from my career in designing, implementing, and running quantitative risk programs. 

Here’s the talk abstract:

Baby Steps: Easing your company into a quantitative cyber risk program

Risk managers tasked with integrating quantitative methods into their risk programs - or even those just curious about it - may be wondering, Where do I start? Where do I get the mountain of data I need? What if my key stakeholders want to see risk communicated in colors?

Attendees will learn about common myths and misconceptions, learn how to get a program started, and receive tips on integrating analysis rigor into risk culture. When it comes to quant risk, ripping the Band-Aid off is a recipe for failure. Focusing on small wins in the beginning, building support from within, and a positive bedside manner is the key to long-term success.

Update: here’s the link to the recording

Additional Resources from the Talk

It was a challenge to cram all the information I wanted to cover into 30 minutes. Sit me down with a few beers in a bar and I could talk about risk all night. This blog post is a companion piece to the talk, linking to the resources I covered and providing extra details. This post matches the flow of the talk so you can follow along.

The One Takeaway

The one takeaway from the talk is: Just be better than you were yesterday.

If you are considering or are in the process of implementing quantitative risk modeling into your risk program and you need to pause or stop for any reason, like lack of internal support, competing priorities, your executive sponsor leaves, that’s ok. There are no quant risk police to come yell at you for using a heat map.

We - the royal we - need to move off the risk matrix. The risk matrix has been studied extensively by those who study those sorts of things: data and decision scientists, engineers, statisticians, and many more. It’s not a credible and defensible decision-making tool. Having said that, the use of the risk matrix is an institutional problem. Fixing the deep issues of perverse incentives and “finger in the wind” methodologies canonized in the information security field doesn’t fall on your shoulders. Just do the best you can with what you have. Add rigor to your program where you can and never stop learning.

The Four Steps to a Quant Risk Program

I have four general steps or phases to help build a quant risk program:

  1. Pre-quant: What to expect when you’re expecting a quant risk program - you are considering quantitative risk and this is how you prepare for it.

  2. Infancy: You’ve picked a model and methodology and you’re ready for your first few steps.

  3. Adolescence: You have several quantitative risk assessments done and you’re fully ready to rage against the qualitative machine. Not so fast – don’t forget to bring everyone along!

  4. Grown-up: Your program is mature and you’re making tweaks, making it better, and adding rigor.

(I’ve never made it past grown-up.)

You can follow these phases in your own program, of course modifying as you see fit until your program is all quant-based. Or, use as much of this or as little as you want, maturing your program as appropriate to your organization.

Step 1: What to Expect When You’re Expecting a Quant Risk Program

In this phase, you’re setting the foundational groundwork, going to training or self-study, and increasing the rigor of your existing qualitative program.

Training - Self-Study
Reading, of course, plenty of reading.

First, some books.

Even if you don’t plan on adopting Factor Analysis of Information Risk (FAIR), I think it’s worth reading some of the documentation to help you get started. Many aspects of risk measurement covered in FAIR port well to any risk model you end up adopting. Check out the Open Group’s whitepapers on OpenFAIR, webinars, and blogs from the FAIR Institute and RiskLens.

Blogs are also a great way to stay up-to-date on risk topics, often directly from practitioners. Here are my favorites:

Webinars

Structured Training & Classes

Add rigor to the existing program
Risk scenario building is part of every formal risk management/risk assessment methodology. Some people skip this portion or do it informally in their qualitative risk programs. You can’t take this shortcut with qualitative risk; this is the first place the risk analyst scopes the assessment and starts to identify where and how to take risk measurements.

If not already, integrate a formal scenario-building process into your qualitative risk program. Document every step. This will make moving to quantitative risk much easier.

Some frameworks that have risk scoping components are:

Adopt a Model
What model are you going to use? Most people use FAIR, but there are others.


Collect Data Sources
Start collecting data sources in your qualitative program. If someone rates the likelihood and magnitude of a data breach as “high,” where could you go to get more information? Write these sources down, even if you’re not ready to start collecting data. Here are a few starting places:

  • Lists of internal data sources: Audits, previous assessments, incident logs and reports, vuln scans, BCP reports

  • External data: Your ISAC, VERIS / DBIR, Cyentia reports, SEC filings, news reports, fines & judgments from regulatory agencies

  • Subject matter experts: Experts in each area you have a risk scenario for; people that inform the frequency of events and the magnitude (often not the same) 

Step 2: Infancy

You’ve picked a model and methodology and you’re ready for your first few steps.

Perform a risk analysis on a management decision
Find someone that has a burning question and perform a risk assessment, outside of your normal risk management process and outside the risk register. The goal is to help this individual make a decision. Some examples:

Get stakeholders accustomed to numbers

Step 3 – Adolescence

You have several quant risk assessments done and are fully ready to rage against the qualitative machine – but not so fast! Don’t forget to bring everyone along!

Perform more decision-based risk assessments
In this step, perform several more decision-based risk analyses. See the list in Step 2 for some ideas. At this point, you’ve probably realized that quantitative cyber risk analysis is not a sub-field of cybersecurity. It’s a sub-field of decision science.

Check out:

Build a Database of Forecasts

Record the frequency and magnitude forecasts for every single risk assessment you perform. You will find that over time, many assessments use the same data or, at least, can be used as a base rate. Building a library of forecasts will speed up assessments - the more you do, the faster they will be.

Some examples:

Watch Your Bedside Manner
This is the easiest tip and one that so few people do. It’s an unfortunate fact: the risk matrix is the de facto language of cyber / technology risk. It’s in the CISSP and CRISC, it’s an acceptable methodology to pass organizational audits like SOX, SOC2, and FFIEC and it’s what’s taught in university curriculums. When moving both organizations and people over to quantitative models, be kind and remember that this a long game.

Do this:

  • Recognize the hard work people have put into existing qualitative risk programs

  • Focus on improving the rigor and fidelity of analysis

  • Talk about what I can do for you: help YOU make decisions

Don’t do this:

  • Disparage previous work & effort on qualitative programs.

  • Quote Tony Cox in the breakroom, even though he’s right. “[Risk matrices are] worse than useless.”

  • Force all people to consume data one way – your way

Step 4: Grown-up

In this step, the quantitative risk program is mature and you’re making tweaks, making it better, adding rigor, and bringing people along for the risk journey. Work on converting the risk register (if you have one) and the formal risk program over to quantitative as your final big step.

Through your work, risk management and the risk register is not busywork; something you show auditors once a year. It’s used as an integral part of business forecasting, helping drive strategic decisions. It’s used by everyone, from the Board down to engineers and everyone in-between.

Here are some references to help with that transition:

Durable Quantitative Risk Programs

The last and final tip is how to make your program durable. You want it to last.

  • Colors and adjectives are OK, but getting stakeholders thinking in numbers will make your program last.

  • Reach far and wide to all areas of the business for SME inputs.

  • Embed your program in all areas of business decision-making.

Final Thoughts

The greatest challenge I’ve found isn’t data (or lack of), which is the most common objection to cyber risk quantification. The greatest challenge is that everyone consumes data differently. Some people are perfectly comfortable with a 5-number summary, loss exceedance curve, and a histogram. Others will struggle with decision-making with colors or numbers. Bias, comfort with quantitative data, personal risk tolerance, and organizational risk culture all play a role.

I recommend the work of Edward Tufte to help risk analysts break through these barriers and present quantitative data in a way that is easily understood and facilitates decision-making.

***

I’m always interested in feedback. It helps me improve. Please let me know what you thought of this talk and/or blog post in the comments below. What would you like to see more of? Less of?



Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

The Elephant in the Risk Governance Room

The risk matrix might feel familiar, but it’s holding your strategy back. This post dives into the loss exceedance curve—a powerful, underused tool that transforms how leaders think about risk, investment, and value.

elephant.jpeg

There is an elephant in the risk governance room.

Effective risk governance means organizations are making data-driven decisions with the best information available at the moment. The elephant, of course, refers to the means and methods used to analyze and visualize risk. The de facto language of business risk is the risk matrix, which enables conversations about threats, prioritizations and investments but lacks a level of depth and rigor to consider it a tool for strategic decision-making. However, there is a better option—one that unlocks deeper, more comprehensive conversations not only about risk, but also how risk impedes or enables organizational strategy and objectives. 

Risk quantification coupled with results visualized via the loss exceedance curve (LEC) is one tool organizations can adopt to help them make informed risk investment decisions. Adopting risk quantification can help organizations unlock a true competitive advantage.

The Risk Matrix

Figure 1 gives an example of a typical risk matrix with 4 general risk themes plotted. The risk matrix is familiar to many organizations in several different industries. It is effective because it conveys, at a glance, information that helps leaders understand risk. In the example in figure 1, the risk matrix tells leadership the following:

  • Risk #1 seems to be about the same as risk #2.

  • Risk #1 and #2 are red, therefore, they should be prioritized for risk response over #3 and #4.

  • Risk #3 is yellow, therefore, it should be responded to, but not before #1 or #2

Figure 1— The Risk Matrix

Figure 1— The Risk Matrix

In other words, the matrix enables conversations about the ranking and prioritization of risk. 

That might seem adequate, but it does not inform the inevitable next question: Are the organization’s security expenditures cost effective and do they bring good value for the money? For example, suppose a risk manager made the statement that investing US$1 million in security controls can reduce a red risk to a yellow risk. It may be accurate, but it comes with a level of imprecision that makes determining cost effectiveness, value and cost benefit difficult, if not impossible. With the risk matrix, the conversation about risk ranking is decoupled from the conversation about how much money to spend reducing risk. 

Is there a better way?

Enter the Loss Exceedance Curve

If organizations want to have deeper conversations about risk, they should consider the LEC. Like the risk matrix, it is a visual display of risk, but it has several additional advantages. One advantage is that it enables investment conversations to happen alongside risk ranking.

Figure 2 shows the same risk themes as figure 1, but they are quantified and plotted on an LEC. The LEC may be new to cyberrisk practitioners, but it is a time-tested visualization used in many disciplines, including accounting, actuarial science and catastrophe modelling.

Figure 2— Loss Exceedance Curve

Figure 2— Loss Exceedance Curve

Organizations can follow each risk along the curve and draw conclusions. In this example, practitioners can follow the line for ransomware and draw the following conclusions:

  • If a ransomware attack occurs, there is a 60% probability that losses will exceed US$20 million and a 20% probability losses will exceed US$60 million.

  • There is a less than 10% probability that ransomware losses will exceed US$95 million. This can be considered a worst-case outcome—a widespread, massive ransomware attack in which critical systems are affected.

  • The red dotted line represents the organization’s loss tolerance, which can be thought of as the red quadrants in the risk matrix. It represents more risk than the organization is comfortable with, therefore, leadership should reduce this risk through mitigation, transference, avoidance or some combination of all 3.

LECs are generated from common cyberrisk quantification (CRQ) models. OpenFAIR is one such model, but many others can be used in cyberrisk. In this case, the risk analyst would input probability and magnitude data from internal and external sources into the model and run a set number of simulations. For example, the model can be set to run 100,000 iterations, which is equivalent to 100,000 years of the organization.

Once organizations have learned how to understand the LEC, a whole new world of data interpretation becomes available. The first step in understanding the LEC is to compare how a single risk is visualized on the risk matrix vs. the LEC. 

In figure 1, the risk matrix leads viewers to believe that there is 1 outcome from a ransomware attack: high risk, which is universally thought of as negative. However, the LEC shows that this is not the case. There is a wide range of possible outcomes, including losses from US$1 thousand to US$100 million. The range aligns with what is known about ransomware attacks. Losses vary greatly depending on many factors, including how many systems are compromised, when defenses detect the malware (e.g., before infection, during the attack, after the ransom demand) and if the attack is caught early enough to allow for an intervention. A single color in the risk matrix cannot communicate these subtleties, and leadership is missing out on essential investment decisions by not considering risk that exists in other colors of the risk matrix.

The LEC also enables meaningful conversations around project planning, investment decisions and deeper discussions on how to best respond to risk.

In this example, the risk matrix led the organization to believe that that the risk of ransomware and data compromise is the same (high) and that leadership should treat them equally when mitigation planning. However, the LEC shows that data compromise has higher projected losses than ransomware and by how much. Worst-case outcomes also occur at different probabilities. This difference is significant when deciding where on the curve to manage risk: most likely outcomes, worst-case outcomes or somewhere in between.

The LEC establishes a financial baseline for further analyses, such as cost/benefit, evaluating capital reserves for significant losses, evaluating insurance and control comparisons.

Conclusion

Increasingly, organizations are data-obsessed and use analysis and interpretation to make decisions, yet many still use one-dimensional tools such as the risk matrix to manage risk. It is an elephant in the proverbial decision-making room and a problem that is too big to ignore.


This article was previously published by ISACA on July 19, 2021. ©2021 ISACA. All rights reserved. Reposted with permission.


Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

When the Experts Disagree in Risk Analysis

What do you do when one expert’s risk forecast is wildly different from the rest? This post breaks down the four common causes—and what to do when the black sheep in your risk analysis might actually be right.

Imagine this scenario: You want to get away for a long weekend and have decided to drive from Los Angeles to Las Vegas. Even though you live in LA, you've never completed this rite of passage, so you're not sure how long it will take given traffic conditions. Luckily for you, 5 of your friends regularly make the trek. Anyone would assume that your friends are experts and would be the people to ask. You also have access to Google Maps, which algorithmically makes predictions based on total miles, projected miles per hour, past trips by other Google Maps users, road construction, driving conditions, etc.

You ask your friends to provide a range of how long they think it will take you, specifically, to make the trip. They (plus Google) come back with this:

  • Google Maps: 4 hours, 8 minutes

  • Fred: 3.5 - 4.5 hours

  • Mary: 3 - 5 hours

  • Sanjay: 4 - 5 hours

  • Steven: 7 - 8 hours

Three of the four experts plus an algorithm roughly agree about the drive time. But, what's up with Steven's estimate? Why does one expert disagree with the group?

A Common Occurrence

Some variability between experts is always expected and even desired. One expert, or a minority of experts, with a divergent opinion, is a fairly common occurrence in any risk analysis project that involves human judgment. Anecdotally, I'd say about one out of every five risk analyses I perform has this issue. There isn't one single way to deal with it. The risk analyst needs to get to the root cause of the divergence and make a judgment call.

I have two sources of inspiration for this post. First, Carol Williams recently wrote a blog post titled Can We Trust the Experts During Risk Assessments? in her blog, ERM Insights, about the problem of differing opinions from experts when eliciting estimations. Second is the new book Noise by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, covering the topic from a macro level. They define noise as "unwanted variability in judgments."

Both sources inspired and caused me to think about how, when, and where I see variability in expert judgment and what I do to deal with it. I've identified four reasons this may happen and how to best respond.

Cause #1: Uncalibrated Experts

Image credit: AG Pic

Image credit: AG Pic

All measurement tools need to be calibrated, meaning the instrument is configured to provide accurate results within an acceptable margin of error. The human mind, being a tool of measurement, also needs to be calibrated.

Calibration doesn't make human expert judgment less wrong. Calibration training helps experts frame their knowledge and available data into a debiased estimate. The estimates of calibrated experts outperform uncalibrated experts because they have the training, practice, and learning about their inherent personal biases to provide estimates with a higher degree of accuracy.

Calibration is one of the first things I look for if an individual's estimations are wildly different from the group: have they gone through estimation and calibration training?

Solution

Put the expert through calibration training - self-study or formalized. There are many online sources to choose from, including Doug Hubbard's course and the Good Judgement Project's training.

Cause #2: Simple misunderstanding

Some experts simply misunderstand the purpose, scope of the analysis, research, or assumptions.

I was once holding a risk forecasting workshop, and I asked the group to forecast how many complete power outages we should expect our Phoenix datacenter to have in the next ten years, given that we’ve experienced two in the last decade. All of the forecasts were about the same - between zero and three. One expert came back with 6-8, which is quite different from everyone else. After a brief conversation, it turned out he misheard the question. He thought I was asking for a forecast on outages on all datacenters worldwide (over 40) instead of just the one in Phoenix. Maybe he was multitasking during the meeting. We aligned on the question, and his new estimate was about the same as the larger group.

You can provide the same data to a large group of people, and it's always possible that one or a few people will interpret it differently. If I aggregated this person's estimation into my analysis without following up, it would have changed the result and produced a report based on faulty assumptions.

Solution

Follow up with the expert and review their understanding of the request. Probe and see if you can find if and where they have misunderstood something. If this is the case, provide extra data and context and adjust the original estimate.

Cause #3: Different worldview

Your expert may view the future - and the problem - differently than the group. Consider the field of climate science as a relevant example.

Climate science forecasting partially relies on expert judgment and collecting probability estimates from scientists. The vast majority – 97% - of active climate scientists agree that humans are causing global warming. 3% do not. This is an example of a different worldview. Many experts have looked at the same data, same assumptions, same questions, and a small subgroup has a different opinion than the majority.

 Some examples in technology risk:

  • A minority of security experts believe that data breaches at a typical US-based company aren't as frequent or as damaging as generally thought.

  • A small but sizable group of security experts assert that security awareness training has very little influence on the frequency of security incidents. (I'm one of them).

  • Some security risk analysts believe the threat of state-sponsored attacks to the typical US company is vastly overstated.

Solution

Let the expert challenge your assumptions. Is there an opportunity to revise your assumptions, data, or analysis? Depending on the analysis and the level of disagreement, you may want to consider multiple risk assessments that show the difference in opinions. Other times, it may be more appropriate to go with the majority opinion but include information on the differing opinion in the final write-up.

 Keep in mind this quote: 

"Science is not a matter of majority vote. Sometimes it is the minority outlier who ultimately turns out to have been correct. Ignoring that fact can lead to results that do not serve the needs of decision makers."

- M. Granger Morgan

Cause #4: The expert knows something that no one else knows

It's always possible that the expert that has a vastly different opinion than everyone else knows something no one else knows.

I once held a risk workshop with a group of experts forecasting the probable frequency of SQL Injection attacks on a particular build of servers. Using data, such as historical compromise rates, industry data, vuln scans, and penetration test reports, all the participants provided a forecast that were generally the same. Except for one guy.

He provided an estimate that forecasted SQL Injection at about 4x the rate as everyone else. I followed up with him, and he told me that the person responsible for patching those systems quit three weeks ago, no one is doing his job currently, and a SQL Injection 0-day is being actively used in the wild! The other experts were not aware of these facts. Oops!

 If I ignored or included his estimates as-is, this valuable piece of information would have been lost.

Solution

Always follow up! If someone has extra data that no one else has, this is an excellent opportunity to share it with the larger group and get a better forecast.

Conclusion

Let's go back to my Vegas trip for a moment. What could be the cause of Steven's divergent estimate?

  • Not calibrated: He has the basic data but lacks the training on articulating that into a usable range.

  • Simple misunderstanding: "Oh, I forgot you moved from San Jose to Los Angeles last year. When you asked me how long it would take to drive to Vegas, I incorrectly gave you the San Jose > Vegas estimate." Different assumptions!

  • Different worldview: This Steven drives under the speed limit and only in the slow lane. He prefers stopping for meals on schedule and eats at sit-down restaurants - never a drive-through. He approaches driving and road trips differently than you do, and his estimates reflect this view.

  • Knows something that the other experts do not know: This Steven remembered that you are bringing your four kids that need to stop often for food, bathroom, and stretch breaks.

In all cases, a little bit of extra detective work finds the root cause of the divergent opinion.

I hope this gives a few good ideas on how to solve this fairly common issue. Did I miss any causes or solutions? Let me know in the comments below.

Further reading

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

The Sweet Spot of Risk Governance

Effective risk governance lives in the sweet spot—between reckless risk-seeking and paralyzing risk aversion. Quantification helps strike that balance, aligning security investments with business value instead of just chasing every red box to green.

In baseball, the “sweet spot” refers to the precise location on a bat where the maximum amount of energy from a batter’s swing is shifted into the ball. It is an equilibrium—the best possible outcome proportional to the amount of effort the batter exerts. A similar concept exists in risk management. IT professionals want to find the best possible balance between risk seeking and risk avoidance. Too much risk seeking leads to an organization taking wild leaps of faith on speculative endeavors, potentially leading to business failure. Extreme risk aversion, on the other hand, causes an organization to fall behind emerging trends and market opportunities—cases in point: Polaroid and Blockbuster. Finding the right balance can move an organization’s risk program from an endless cycle of opening and closing entries on a risk register to a program that truly aligns with and promotes business objectives.

Risk Seeking

Risk is not necessarily bad. Everyone engages in risky behavior to achieve an objective, whether it is driving a car or eating a hot dog. Both activities cause deaths every year, but there is a willingness to take on the risk because of the perceived benefits. Business is no different. Having computers connected to the Internet and taking credit card transactions present risk, but not engaging in those activities presents even more risk: the complete inability to conduct business. Seeking new opportunities and accepting the associated level of risk is part of business and life.

Risk Avoidance

Identifying and mitigating risk is an area where risk managers excel, sometimes to the detriment of understanding the importance of seeking risk. This can be seen especially in information security and technology risk where the impulse is to mitigate all reds to greens, forgetting that every security control comes with an opportunity cost and potential end user friction. The connection between risk, whether seeking or avoiding, and business needs to be inexorably linked if a risk management program has any chance for long-term success.

Sweet Spot

Think of risk behavior as a baseball bat. A batter should not hit the ball on the knob or the end cap. It is wasted energy. One also does not want to engage in extreme risk seeking or risk avoidance behaviors. Somewhere in the middle there is an equilibrium. It is the job of the risk manager to help leadership find the balance between risk that enables business and risk that lies beyond an organization’s tolerance.

This can be done by listening to leadership, learning where the organization’s appetite for risk lies and selecting controls in a smart, risk-aware way. Security and controls are very important. They can mitigate serious, costly risk, but balance is needed.

Risk quantification is an indispensable tool in finding and communicating balance as it helps leadership understand the amount of risk exposure in an area, by how much security controls can reduce exposure and, perhaps most important, if the cost of controls are proportional to the amount of risk reduced. The balance is a crucial part of risk governance and helps leadership connect risk to its effect on business objectives in a tangible and pragmatic way.


This article was previously published by ISACA on April 5, 2021. ©2021 ISACA. All rights reserved. Reposted with permission.

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Risk modeling the vulnerability du jour, part 2: Forward-looking risk registers

Risk registers shouldn’t be a graveyard of past incidents—they should be living forecasts of future loss. Here’s how to model emerging threats like ShadowBrokers or Spectre and make your risk register proactive, not reactive.

"extreme horizon" by uair01 is licensed under CC BY 2.0

"extreme horizon" by uair01 is licensed under CC BY 2.0

Strange, unusual, media-worthy vulnerabilities and cyberattacks… they seem to pop up every few months or so and send us risk managers into a fire drill. The inevitable questions follow: Can what happened to Yahoo happen to us? Are we at risk of a Heartbleed-type vuln? And, my personal favorite, Is this on our risk register?

This post is the second of a two-part series on how to frame, scope, and model unusual or emerging risks in your company's risk register. Part 1 covered how to identify, frame, and conceptualize these kinds of risks. Part 2, this post, introduces several tips and steps I use to brainstorm emerging risks and include the results in your risk register.

What’s a “forward-looking risk register”? 

Before we get started, here’s the single most important takeaway of this blog post:

Risk registers should be forecasts, not a big ‘o list of problems that need to be fixed.

It shouldn't be a list of JIRA tickets of all the systems that are vulnerable to SQL injection, don't have working backups, or are policy violations. That's a different list.

 A risk register: 

  • Identifies the bad things that happen. For example, a threat uses SQL injection against your database and obtains your customer list

  • Forecasts the probability of the bad things happening, and

  • How much it could cost you if it does happen

In other words, risk registers look forward, not back. They are proactive, not reactive.

Including new threats and vector in your risk register

du jour Picture1.png

When I revamp a risk program, the first thing I do is make sure the company's risk register - the authoritative list of all risks that we know and care about - is as comprehensive and complete as possible. Next, I look for blind spots and new, emerging risks.

Here's my 4 step process to identify risk register blind spots, brainstorm new risks, how to integrate them into your register, and implement continuous monitoring.

Step 1: Inventory historical vulns and identify blind spots in your register

Run-of-the-mill risks like data breaches, outages, phishing, and fraud are easily turned into risk scenarios. It's a bit harder to identify risk blind spots. When a security incident story hits the major media and is big enough that I think I'm going to be asked about it, I start to analyze it. I look at who the threat actors are, their motivations, vector of attack, and probable impact. I then compare my analysis with the list of existing risks and ask myself, Am I missing anything? What lessons can I learn from the past to help me forecast the future?

Here are some examples:

Vulnerability / Incident What happened Lessons for your risk register
Solarwinds hack (2020) SolarWinds' software build process was infiltrated, giving attackers a foothold in Solarwinds customers' networks. Software you trust gets delivered or updated with malware or provides access to system resources.
Target hack (2013) Phishing email targeted at a vendor was successful, giving the attackers access to internal Target systems, leading to a breach of cardholder data. Vendors you trust are compromised, giving attackers a foothold into your systems.
Sony Pictures Entertainment (SPE) (2014) SPE released a movie that was unflattering to a particular regime, leading to a large-scale cyberattack that included ransom/extortion, massive data leaks, and a prolonged system outage. State-sponsored groups or hacktivists are unhappy with a company's positions, products, leadership, or employee opinions and launch a cyber-attack in retaliation.
Rowhammer vuln Privilege escalation and network-based attacks by causing a data leakage in DRAM. There are hardware vulnerabilities that are OS-independent. OS/supplier diversification is not a panacea.
Spectre / Meltdown An attacker exploits a vuln present in most modern CPUs, allowing access to data. Same as above
Cold boot attack An attacker with physical access to the target computer gains access to the data in memory. You must assume that if an attacker is motivated, adequately resourced, and has the right knowledge, they can do anything with physical access to hardware. See the Evil Maid Attack.
Heartbleed Bug in the OpenSSL library gives attackers access to data or the ability to impersonate sessions. Linus’ Law (“given enough eyeballs, all bugs are shallow”) is not a risk mitigation technique. Open-source software has vulnerabilities just like commercial software, and sometimes they’re really bad.
Shadowbrokers leak (2016) A massive leak of NSA hacking tools and zero-day exploits. Criminal organizations and state-sponsored groups have some of the scariest tools and exploits and are unknown to software vendors and the general public. When these get leaked, adjust associated incident probabilities up.

Step 2: Brainstorm any additional risks

Keep it high-level and focus on resiliency instead of modeling out specific threat actions, methods, state-sponsored versus cybercriminals activity, etc. For example, you don't need to predict the next cold-boot type hardware attack. Focus on what you could do to improve overall security and resilience against scenarios in which the attackers have physical access to your hardware, whoever they may be. 

Step 3: Integrate into the risk register 

This step is a bit more complex, and the approach will significantly depend on your company's risk culture and what your audience expects out of a risk register.

One approach is to integrate your new scenarios into existing risk scenarios. For example, suppose you already have an existing data breach risk analysis. In that case, you can revisit the assumptions, probability of occurrence, and the factors that make up the loss side of the equation and ensure that events, such as Shadowbrokers or Target, are reflected.

Another approach is to create new risk scenarios, but this could make the register very busy with hypotheticals. Risk managers at Microsoft, the US State Department, and defense contractors probably would have a robust list of hypotheticals. The rest of us would just build the risks into existing scenarios.

Step 4: Continuous monitoring

As new attacks, methods, vectors, and vulnerabilities are made public, ask the following questions:

  • Conceptually and from a high level, do you have an existing scenario that covers this risk? Part 1 gives more advice on how to determine this.

  • Framing the event as a risk scenario, does it apply to your organization?

  • Is the risk plausible and probable, given your organization's risk profile? I never try to answer this myself; I convene a group of experts and elicit opinions.

In addition to the above, hold yearly emerging risk brainstorming sessions. What's missing, what's on the horizon, and where should we perform risk analyses?

I hope this gives you some good pointers to future-proof your risk register. What do you think? How do you approach identifying emerging risk? Let me know in the comments below.

Further reading

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Risk modeling the vulnerability du jour, part 1: Framing

When the next headline-making exploit drops, your execs will ask, “Was this on our risk register?” This guide walks through how to proactively frame exotic or emerging threats—without falling into the FUD trap.

vulns.png

Every few months or so, we hear about a widespread vulnerability or cyber attack that makes its way to mainstream news. Some get snappy nicknames and their very own logos, like Meltdown, Specter, and Heartbleed. Others, like the Sony Pictures Entertainment, OPM, and Solarwinds attacks cause a flurry of activity across corporate America with executives asking their CISO’s and risk managers, “Are we vulnerable?”

I like to call these the vulnerability du jour, and I’m only half sarcastic when I say that. On the one hand, it’s a little annoying how sensationalized these are. Media-worthy vulnerabilities and attacks feel interchangeable: when one runs its course and attention spans drift, here’s another one to take its place, just like a restaurant’s soup of the day. On the other hand, if this is what it takes to get the boardroom talking about risk - I’ll take it.

When the vulnerability du jour comes on an executive’s radar, the third or fourth question is usually, “Was this on our risk register?” Of course, we don’t have crystal balls and can’t precisely predict the next big thing, but we can use brainstorming and thought exercises to ensure our risk registers are well-rounded. A well-rounded and proactive risk register includes as many of these events and vulnerabilities as possible - on a high level.

This is a two-part series, with this post (part 1) setting some basic guidelines on framing these types of risks. Part 2 gives brainstorming ideas on how to turn your risk register into one that’s forward-looking and proactive instead of reactive. 

Building a forward-looking risk register means you’re holding at least annual emerging risk workshops in which you gather a cross-section of subject matter experts in your company and brainstorm any new or emerging risks to fill in gaps and blind spots in your risk register.  I have three golden rules to keep in mind when you’re holding these workshops.

Golden Rules of Identifying Emerging Risk

#1: No specifics

Meteorologists, not Miss Cleo

Meteorologists, not Miss Cleo

We’re forecasters, not fortune tellers (think a meteorologist versus Miss Cleo). I don’t think anyone had “State-sponsored attackers compromise SolarWind’s source code build system, leading to a company data breach” in their list of risks before December 2020. (If you did - message me. I’d love to know if Elvis is alive and how he’s doing.)

Keep it high-level and focused on how the company can prepare for resiliency rather than a specific vector or method of attack. For example, one can incorporate the SolarWinds incident with a generalized risk statement like the example below. This covers the SolarWinds vector and other supply chain attacks and also provides a starting point to future-proof your risk register to similar attacks we will see in the future.

Attacker infiltrates and compromises a software vendor's source code and/or build and update process, leading to company security incidents (e.g. malware distribution, unauthorized data access, unauthorized system access.)

The fallout from the incident can be further decomposed and quantified using FAIR’s 6 forms of loss or a similar model.

#2: Risk Quantification

The risk matrix encourages FUD (fear, uncertainty, doubt)

The risk matrix encourages FUD (fear, uncertainty, doubt)

Communicating hypothetical, speculative, or rare risks is hard to do without scaring people. If a good portion of your company’s risk register is stuff like hypervisor escapes,  privilege escalation via rowhammer, and state-sponsored attacks you really need to have the data to back up why it needs executive attention. Otherwise, it will just look like another case of FUD.

The key to success is risk quantification: risk articulated in numbers, not colors. A bunch of red high risks (or green low risks) obfuscates the true story you are trying to tell.

All risk, because it’s forward-looking, is filled with uncertainty.

Unique and exotic risks have even more uncertainty. For example, there have been so many data breaches that we have a really good idea of how often they occur and how much it costs. Supply chain attacks like SolarWinds? Not so much. Choose a risk model that can communicate both the analyst’s uncertainty and the wide range of possibilities.

I use FAIR because it’s purpose-built for information and operational risks but you really can use any quantitative model.

#3: Be aware of risk blindness

dark tunnel 2.jpeg

Every good risk analyst knows the difference between risks that are possible and those that are probable. Without going too deep into philosophy, just about anything is possible and it’s the risk analyst’s job to reign in people when the risk brainstorming veers to outlandish scenarios. But, don’t reign them in too much!

Any risk, no matter how routine, was unique and a surprise to someone once upon a time. Ransomware would sound strange to someone in the 1970’s; computer crime would be received as black magic to someone in the 15th century.

Try to put yourself in this mindset as you hold emerging risk workshops. I personally love holding workshops with incident responders and red teamers - they have the ability to think outside of the box and are not shy about coming up with hypotheticals and highly speculative scenarios. Don’t discourage them. Yes, we still need to separate out possible from probable, but it is an emerging risk workshop. See what they come up with.

Next Up

I hope this post got you thinking about how to add these types of risks to your register. In part 2, I’m going to give real-life examples of how to further brainstorm and workshop out these risks.

Further reading

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Risk Mythbusters: We need actuarial tables to quantify cyber risk

Think you need actuarial tables to quantify cyber risk? You don’t — actuaries have been pricing rare, high-uncertainty risks for centuries using imperfect data, expert judgment, and common sense, and so can you.

Risk management pioneers: The New Lloyd's Coffee House, Pope's Head Alley, London

Risk management pioneers: The New Lloyd's Coffee House, Pope's Head Alley, London

The auditor stared blankly at me, waiting for me to finish speaking. Sensing a pause, he declared, “Well, actually, it’s not possible to quantify cyber risk. You don’t have cyber actuarial tables.” If I had a dollar for every time I heard that… you know how the rest goes.

There are many myths about cyber risk quantification that have become so common, they border on urban legend. The idea that we need vast and near-perfect historical data is a compelling and persistent argument, enough to discourage all the but the most determined of risk analysts. Here’s the flaw in that argument: actuarial science is a varied and vast discipline, selling insurance on everything from automobile accidents to alien abduction - many of which do not have actuarial tables or even historical data. Waiting for “perfect” historical data is a fruitless exercise and will prevent the analyst from using the data at hand, no matter how sparse or flawed, to drive better decisions.

Insurance without actuarial tables

Many contemporary insurance products, such as car, house, fire, and life have rich historical data today. However, many insurance products have for decades - in some cases, centuries - been issued without historical data, actuarial tables, or even good information. For those still incredulous, consider the following examples:

  • Auto insurance: Issuing auto insurance was unheard of when the first policy was issued in 1898. Companies only insured horse-drawn carriages up to that point, and actuaries used data from other types of insurance to set a price.

  • Celebrities’ body parts: Policies on Keith Richards’ hands and David Beckham’s legs are excellent tabloid fodder, but also a great example of how actuaries are able to price rare events.

  • First few years of cyber insurance: Claims data was sparse in the 1970’s, when this product was first conceived, but there was money to be made. Insurance companies set initial prices based on estimates and adjacent data. Prices were adjusted as claims data became available.

There are many more examples: bioterrorism, capital models, and reputation insurance to name a few.

How do actuaries do it?

Many professions, from cyber risk to oil and gas exploration, use the same estimation methods developed by actuaries hundreds of years ago. Find as much relevant historical data as possible - this can be adjacent data, such as the number of horse-drawn carriage crashes when setting a price for the first automobile policy - and bring it to the experts. Experts then apply reasoning, judgment, and their own experience to set insurance prices or estimate the probability of a data breach.

Subjective data encoded quantitatively isn’t bad! On the contrary, it’s very useful when there is deep uncertainty, sparse data, data is expensive to acquire or a new, emerging risk.

I’m always a little surprised when people reject better methods altogether, citing the lack of “perfect data,” then swing in the opposite direction to gut checks and wet finger estimation. The tools and techniques are out there to make cyber risk quantification not only possible but could give any company a competitive edge. Entire industries have been built around less than perfect data and we as cyber risk professionals should not use a lack of perfect data as an excuse not to quantify cyber risk. If there is a value placed on Tom Jones' chest hair then certainly we can predict the loss risk of a data incident... go ask the actuaries!

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Recipe for passing the OpenFAIR exam

Thinking about the OpenFAIR certification? Here's a practical, no-fluff study guide to help you prep smarter—not harder—and walk into the exam with confidence.

exam.jpg

Passing and obtaining the OpenGroup’s OpenFAIR certification is a big career booster for information risk analysts. Not only does it look good on your CV, it demonstrates your mastery of FAIR to current and potential employers. It also makes a better analyst because it deepens one’s understanding of risk concepts that may not be often used. I passed the exam myself a while back, and I’ve also helped people prepare and study for it. This is my recipe for studying for and passing the OpenFAIR exam.

What to study

The first thing you need to understand in order to pass the exam is that the certification is based on the published OpenFAIR standard, last updated in 2013. Many people and organizations - bloggers, risk folks on Twitter, the FAIR Institute, me, Jack Jones himself - have put their own spin and interpretation on FAIR in the years since the standard was published. Reading this material is important to becoming a good risk analyst but it won’t help you pass the exam. You need to study and commit to memory the OpenFAIR standard. If you find contradictions in later texts, favor the OpenFAIR documentation.

Now, get your materials

The two most important texts are: 

Two more optional texts, but highly recommended:

How to Study

This is how I recommend you study for the exam:

Thoroughly read the Taxonomy (O-RT) and Analysis (O-RA) standards, cover to cover. Use the FAIR book, blogs, and other papers you find to help answer questions or supplement your understanding, but use the PDF’s as your main study aid.

Start memorizing - there are only three primary items that require rote memorization; everything else is common sense if you have a mastery of the materials. Those items are:

The Risk Management Stack

You need to know what they are, but more importantly, you need to know them in order.

risk stack.jpg

Accurate models lead to meaningful measurements, which lead to effective comparisons - you get the idea. The test will have several questions like, “What enables well-informed decisions?” Answer: effective comparisons. I never did find a useful mnemonic that stuck like Please Don’t Throw Sausage Pizzas Away, but try to come up with something that works for you.

The FAIR Model

You are probably already familiar with the FAIR model and how it works by now, but you need to memorize it exactly as it appears on the ontology.

The FAIR model (source: FAIR Institute)

The FAIR model (source: FAIR Institute)

It’s not enough to know that Loss Event Frequency is derived from Threat Event Frequency and Vulnerability - you need to know that Threat Event Frequency is in the left box and Vulnerability is on the right. Once a day, draw out 13 blank boxes and fill them in. The test will ask you to match various FAIR elements of risk on an empty ontology. You also need to know if each element is a percentage or a number. This should be easier to memorize if you have a true understanding of the definitions.

Forms of Loss

Last, you need to know the six forms of loss. You don’t need to memorize the order, but you definitely need to recognize these as the six forms of loss and have a firm understanding of the definitions.

Productivity Loss

Response Loss

Replacement Loss

Fines and Judgements

Competitive Advantage

Reputation Damage

Quiz Yourself

I really recommend paying the $29.95 for the OpenFAIR Foundation Study Guide PDF. It has material review, questions/answers at the end of each chapter, and several full practice tests. The practice tests are so similar (even the same, for many questions) to the real test, that if you ace the practice tests, you’re ready. Also, check out FAIR certification flashcards for help in understanding the core concepts.

When you think you’re ready, register for your exam for a couple of weeks out. This gives you time to keep taking practice tests and memorizing terms.

In Closing…

It’s not a terribly difficult test, but you truly need a mastery of the FAIR risk concepts to pass. I think if you have a solid foundation in risk analysis in general, it takes a few weeks to study, as opposed to months for the CRISC or CISSP. 

Good luck with your FAIR journey! As always, feel free to reach out to me or ask questions in the comments below.

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Book Review | The Failure of Risk Management: Why It's Broken and How to Fix It, 2nd Edition

Doug Hubbard’s The Failure of Risk Management ruffled feathers in 2012—and the second edition lands just as hard, now with more tools, stories, and real-world tactics. If you’ve ever been frustrated by heat maps, this book is your upgrade path to real, defensible risk analysis.

When the first edition of The Failure of Risk Management: Why It's Broken and How to Fix It by Douglas Hubbard came out in 2012, it made a lot of people uncomfortable. Hubbard laid out well-researched arguments that some of businesses’ most popular methods of measuring risk have failed, and in many cases, are worse than doing nothing. Some of these methods include the risk matrix, heat map, ordinal scales, and other methods that fit into the qualitative risk category. Readers of the 1st edition will know that the fix is, of course, methods based on mathematical models, simulations, data, and evidence collection. The 2nd edition, released in March 2020, builds on the work of the previous edition but brings it into 2020 with more contemporary examples of the failure of qualitative methods and tangible advice on how to incorporate quantitative methods into readers’ risk programs. If you considered the 1st edition required reading, as many people do (including myself), the 2nd edition is a worthy addition to your bookshelf because of the extra content.

The closest I’ll get to an unboxing video

The closest I’ll get to an unboxing video

The book that (almost) started it all

I don’t think it would be fair to Jacob Bernoulli’s 1713 book Ars Conjectandi to say that the first edition of The Failure of Risk Management started it all, but Hubbard’s book certainly brought concepts such as probability theory into the modern business setting. Quantitative methodologies have been around for hundreds of years, but in the 1980’s and ‘90’s people started to look for shortcuts around the math, evidence gathering, and critical thinking. Those companies starting using qualitative models (e.g., red/yellow/green, high/medium/low, heat maps) and these, unfortunately, became the de facto language of risk in most business analysis. Hubbard noticed this and carefully laid out an argument on why these methods are flawed and gave readers tangible examples of how to re-integrate quantitative methodologies into decision and risk analysis.

Hubbard eloquently reminds readers in Part Two of his new book all the reasons why qualitative methodologies have failed us. Most readers should be familiar with the arguments at this point and will find the “How to Fix It” portion of the book, Part Three, a much more interesting and compelling read. We can tell people all day how they’re using broken models, but if we don’t offer an alternative they can use, I fear arguments will fall on deaf ears. I can't tell you how many times I've seen a LinkedIn risk argument (yes, we have those) end with, “Well, you should have learned that in Statistics 101.” We’ll never change the world this way.

Hubbard avoids the dogmatic elements of these arguments and gives all readers actionable ways to integrate data-based decision making into risk programs. Some of the topics he covers include calibration, sampling methods for gathering data, an introduction to Monte Carlo simulations, and integrating better risk analysis methods into a broader risk management program. What's most remarkable isn't what he covers, but how he covers it. It’s accessible, (mostly) mathless, uses common terminology, and is loaded with stories and anecdotes. Most importantly, the reader can run quantitative risk analysis with Monte Carlo simulations from the comfort on their own computer with nothing more than Excel. I know that Hubbard has received criticism for using Excel instead of more typical data analysis software, such as Python or R, but I see this as a positive. With over 1.2 billion installs of Excel worldwide, readers can get started today instead of learning how to code and struggling with installing new software and packages. Anyone with motivation and a computer can perform quantitative risk analysis.

What’s New?

There are about 100 new pages in the second edition, with most being new content, but some readers will recognize concepts from Hubbard’s newer books, like the 2nd edition of How to Measure Anything and How to Measure Anything in Cybersecurity Risk. Some of the updated content includes:

  •  An enhanced introduction, that includes commentary on the many of the failures of risk management that has occurred since the 1st edition was published, such as increased cyber-attacks and the Deepwater Horizon oil spill.

  • I was delighted to see much more content around how to get started in quantitative modeling in Part 1. Readers only need a desire to learn, and not a ton of risk or math experience to get started immediately.

  • Much more information is provided on calibration and how to reduce cognitive biases, such as the overconfidence effect.

  • Hubbard beefed up many sections with stories and examples, helping the reader connect even the most esoteric risk and math concepts to the real world.

Are things getting better?

It’s easy to think that things haven’t changed much. After all, most companies, frameworks, standards, and auditors still use qualitative methodologies and models. However, going back and leafing through the 1st edition and comparing it with the 2nd edition made me realize there has been significant movement in the last eight years. I work primarily in the cyber risk field, so I'm only going to speak to that subject, but the growing popularity of Factor Analysis of Information Risk (FAIR) – a quantitative cyber risk model – is proof that we are moving away from qualitative methods, albeit slowly. There are also two national conferences, FAIRcon and SIRAcon, that are dedicated to advancing quantitative cyber risk practices – both of which didn’t exist in 2012.

I'm happy that I picked up the second edition. The new content and commentary are certainly worth the money. If you haven’t read either edition and want to break into the risk field, I would add this to your required reading list and make sure you get the newer edition. The book truly changed the field for the better in 2012, and the latest edition paves the way for the next generation of data-driven risk analysts.

You can buy the book here.

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Exploit Prediction Scoring System (EPSS): Good news for risk analysts

Security teams have long relied on CVSS to rank vulnerabilities—but it was never meant to measure risk. EPSS changes the game by forecasting the likelihood of exploitation, giving risk analysts the probability input we’ve been missing.

hamster.png

I'm excited about Exploit Prediction Scoring System (EPSS)! Most Information Security and IT professionals will tell you that one of their top pain points is vulnerability management. Keeping systems updated feels like a hamster wheel of work: update after update, yet always behind. It’s simply not possible to update all the systems all the time, so prioritization is needed. Common Vulnerability Scoring System (CVSS) provides a way to rank vulnerabilities, but at least from the risk analyst perspective, something more is needed. EPSS is what we’ve been looking for.


Hi CVSS. It’s not you, it’s me

costanze.jpeg

Introduced in 2007, CVSS was the first mainstream model to tackle the vulnerability ranking problem and provide an open and easy-to-use model that offers a ranking of vulnerabilities. Security, risk, and IT people could then use the scores as a starting point to understand how vulnerabilities compare with each other, and by extension, prioritize system management.

CVSS takes a weighted scorecard approach. It combines base metrics (access vector, attack complexity, and authentication) with impact metrics (confidentiality, integrity, availability). Each factor is weighted and added together, resulting in a combined score of 0 through 10, with 10 being the most critical and needing urgent attention.

CVSS scores and rating

CVSS scores and rating

So, what’s the problem? Why do we want to break up with CVSS? Put simply, it’s a little bit of you, CVSS – but it’s mostly me (us). CVSS has a few problems: there are better models than a weighted scorecard ranked on an ordinal scale, and exploit complexity has seriously outgrown the base/impact metrics approach. Despite the problems, it’s a model that has served us well over the years. The problem lies with us; the way we use it, the way we've shoehorned CVSS into our security programs way beyond what it was ever intended to be. We’ve abused CVSS.

We use it as a de facto vulnerability risk ranking system. Keep in mind that risk, which is generally defined as an adverse event that negatively affects objectives, is made up of two components: the probability of a bad thing happening, and the impact to your objectives if it does. Now go back up and read what the base and impact metrics are: it’s not risk. Yes, they can be factors that comprise portions of risk, but a CVSS score is not risk on its own.

CVSS was never meant to communicate risk.

The newly released v3.1 adds more metrics on the exploitability of vulnerabilities, which is a step in the right direction. But, what if we were able to forecast future exploitability?

Why I like EPSS

If we want to change the way things are done, we can browbeat people with complaints about CVSS and tell them it’s broken, or we can make it easy for people to use a better model. EPSS does just that. I first heard about EPSS after Blackhat 2019 when Michael Roytman and Jay Jacobs gave a talk and released an accompanying paper describing the problem space and how their model solves many issues facing the field. In the time since, an online EPSS calculator as been released. After reading the paper and using the calculator on several real-world risk analysis, I’ve come to the conclusion that EPSS is easier and much more effective than using CVSS to prioritize remediation efforts based on risk. Some of my main takeaways on EPSS are:

  • True forecasting methodology: The EPSS calculation returns a probability of exploit in the next 12 months. This is meaningful, unambiguous – and most importantly – information we can take action on.

  • A move away from the weighted scorecard model. Five inputs into a weighted scorecard is not adequate to understand the full scope of harm a vulnerability can (or can’t) cause, considering system and exploit complexities.

  • Improved measurement: The creators behind EPSS created a model that inspects the attributes of a current vulnerability and compares it with the attributes of vulnerabilities in the past and whether or not they've been successfully exploited. This is the best indicator we have that will tell us whether not something is likely to be exploitable in the future. This will result in (hopefully) better vulnerability prioritization. This is an evolution from CVSS which measures attributes that may not be correlated to a vulnerability’s chance of exploit.

  • Comparisons: When using an ordinal scale, you can only make comparisons between items on that scale. By using probabilities, EPSS allows the analyst to compare anything: a system update, another risk that has been identified outside of CVSS, etc.

EPSS output (source: https://www.kennaresearch.com/tools/epss-calculator/)

EPSS output (source: https://www.kennaresearch.com/tools/epss-calculator/)

In a risk analysis, EPSS significantly improves assessing the probability side of the equation. In some scenarios, a risk analyst can use this input directly, leaving only magnitude to work on. This speeds up the time to perform risk assessments over CVSS. Using CVSS as an input to help determine the probability of successful exploit requires a bit of extra work. For example, I would check to see if a Metasploit package was available, combine with past internal incident data and ask a few SME’s for adjustment. Admittedly crude and time-consuming, but it worked. I don't have to do this anymore.

There’s a caution to this, however. EPSS informs the probability portion only of a risk calculation. You still need to calculate magnitude by cataloging the data types on the system and determine the various ways your company could be impacted if the system was unavailable or the data disclosed.

Determining the probability of a future event is always a struggle, and EPSS significantly reduces the amount of work we have to do. I’m interested in hearing from other people in Information Security – is this significant for you as well? Does this supplement, or even replace, CVSS? If not, why?

Further Reading and Links:

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Aggregating Expert Opinion: Simple Averaging Method in Excel

Simple averaging methods in Excel, such as mean and median, can help aggregate expert opinions for risk analysis, though each approach has trade-offs. Analysts should remain cautious of the "flaw of averages," where extreme values may hide important insights or errors.

4375119383_83ae332cec_b.jpg

"Expert judgment has always played a large role in science and engineering. Increasingly, expert judgment is recognized as just another type of scientific data …" -Goossens et al., “Application and Evaluation of an Expert Judgment Elicitation Procedure for Correlations

 

Have you ever thought to yourself, if only there were an easy way of aggregating the numerical estimates of experts to use in a risk analysis... then this post is for you. My previous post on this topic, Aggregating Expert Opinion in Risk Analysis: An Overview of Methods covered the basics of expert opinion and the two main methods of aggregation, behavioral and mathematical. While each method has pros and cons, the resulting single distribution is a representation of all the estimates provided by the group and can be used in a risk analysis. This post focuses on one, of several, mathematical methods - simple averaging in Excel. I’ll cover the linear opinion pool with expert weighting method using R in the next blog post.

But first, a note about averaging…

DanzigerCoverArtSavage.jpg

Have you heard the joke about the statistician that drowned in 3 feet of water, on average? An average is one number that represents the central tendency of a set of numbers. Averaging is a way to communicate data efficiently, and because it's broadly understood, many are comfortable with using it. However – the major flaw with averaging a group of numbers is that insight into extreme values is lost. This concept is expertly covered in Dr. Sam Savage’s book, The Flaw of Averages.

Consider this example. The table below represents two (fictional) companies’ year-over-year ransomware incident data.

Fig 1: Company A and Company B ransomware incident data. On average, it’s about the same. Examining the values separately reveals a different story

Fig 1: Company A and Company B ransomware incident data. On average, it’s about the same. Examining the values separately reveals a different story

After analyzing the data, one could make the following assertion:

Over a 5-year period, the ransomware incident rates for Company A and Company B, on average, are about the same.

This is a true statement.

One could also make a different – and also true – assertion.

Company A’s ransomware infection rates are slowly reducing, year over year. Something very, very bad happened to Company B in 2019.

In the first assertion, the 2019 infection rate for Company B is an extreme value that gets lost in averaging. The story changes when the data is analyzed as a set instead of a single value. The cautionary tale of averaging expert opinion into a single distribution is this: the analyst loses insight into those extreme values. 

Those extreme values may represent:

  •  An expert misinterpreted data or has different assumptions that skew the distribution and introduces error into the analysis.

  • The person that gave the extreme value knows something that no one else knows and is right. Averaging loses this insight.

  • The “expert” is not an expert after all, and the estimations are little more than made up. This may not even be intentional – the individual may truly believe they have expertise in the area (see the Dunning-Kruger effect). Averaging rolls this into one skewed number.

Whenever one takes a group of distributions and combines them into one single distribution – regardless of whether you are using simple arithmetic mean or linear opinion pooling with weighting, you are going to lose something. Some methods minimize errors in one area, at the expense of others. Be aware of this problem. Overall, the advantages of using group estimates outweigh the drawbacks. My best advice is to be aware of the flaws of averages and always review and investigate extreme values in data sets.

Let’s get to it

To help us conceptualize the method, imagine this scenario:

You are a risk manager at a Fortune 100 company, and you want to update the company's risk analysis on a significant data breach of 100,000 or more records containing PII. You have last years’ estimate and have performed analysis on breach probabilities using public datasets. The company's controls have improved in the previous year, and, according to maturity model benchmarking,  controls are above the industry average.

The first step is to analyze the data and fit it to the analysis – as it applies to the company and, more importantly, the question under consideration. It’s clear that while all the data points are helpful, no single data point fits the analysis exactly. Some level of adjustment is needed to forecast future data breaches given the changing control environment. This is where experts come in. They take all the available data, analyze it, and use it to create a forecast.

The next step is to gather some people in the Information Security department together and ask for a review and update of the company's analysis of a significant data breach using the following data:

  • Last year's analysis, which put the probability of a significant data breach at between 5% and 15%

  • Your analysis of data breaches using public data sets, which puts the probability at between 5% and 10%.

  • Status of projects that influence - in either direction - the probability or the impact of such an event.

  • Other relevant information, such as a year-over-year comparison of penetration test results, vulnerability scans, mean-time-to-remediation metrics, staffing levels and audit results.

Armed with this data, the experts provide three estimates. In FAIR terminology, this is articulated as  - with a 90% confidence interval- “Minimum value” (5%), Most Likely (50%), and Maximum (95%). In other words, you are asking your experts to provide a range that, they believe, will include the true value 90% of the time.

The experts return the following:

Fig 2: Data breach probability estimates from company experts

Fig 2: Data breach probability estimates from company experts

There are differences, but generally, the experts are in the same ballpark. Nothing jumps out at us as an extreme value that might need follow-up with an expert to check assumptions, review the data or see if they know something the rest of the group doesn't know (e.g. a critical control failure).

How do we combine them?

Aggregating estimates employs a few major performance improvements to the inputs to our risk analysis. First, it pools the collective wisdom of our experts. We have a better chance of arriving at an accurate answer than just using the opinion of one expert. Second, as described in The Wisdom of Crowds, by James Surowiecki opinion aggregation tents to cancel out bias. For example, the overconfident folks will cancel out the under-confident ones, etc. Last - we are able to use a true forecast in the risk analysis that represents a changing control environment. Using solely historical data doesn’t reflect the changing control environment and the changing threat landscape.

For this example., we are going to use Microsoft Excel, but any semi-modern spreadsheet program will work. There are three ways to measure the central tendency of a group of numbers: mean, mode, and median. Mode counts the number of occurrences of numbers in a data set, so it is not the best choice. Mean and median are most appropriate for this application. There is not a clear consensus around which one of the two, mean or median, performs better. However, recent research jointly performed by USC and the Department of Homeland Security examining median versus mean when averaging expert judgement estimates indicates the following:

  • Mean averaging corrects for over-confidence better than median, therefore it performs well when experts are not calibrated. However, mean averaging is influenced by extreme values

  • Median performs better when experts are calibrated and independent. Median is not influenced by extreme values.

I’m going to demonstrate both. Here are the results of performing both function on the data in Fig. 1:

Fig 3. Mean and Median values of the data found in Fig. 2

Fig 3. Mean and Median values of the data found in Fig. 2

The mean function in Excel is =AVERAGE(number1, number2…)

The median function in Excel is=MEDIAN(number1, number2…)

Download the Excel workbook here to see the results.

Next Steps

The results are easily used in a risk analysis. The probability of a data breach is based on external research, internal data, takes in-flight security projects into account and brings in the opinion of our own experts. In other words, it’s defensible. FAIR users can simply replace the probability percentages with frequency numbers and perform the same math functions.

There’s still one more method to cover – linear opinion pool. This is perhaps the most common and introduces the concept of weighting experts into the mix. Stay tuned – that post is coming soon.

Further Reading

 


Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Aggregating Expert Opinion in Risk Analysis: An Overview of Methods

Want a quick way to combine expert estimates into a usable forecast? This post walks through simple mean and median averaging in Excel—great for risk analysts who need a defensible input without the overhead of complex statistical tooling.

a-27.jpg

Expert elicitation is simple to define, but difficult to effectively use given its complexities. Most of us already use some form of expert elicitation while performing a risk analysis whenever we ask someone their opinion on a particular data point. The importance of using a structured methodology for collecting and aggregating expert opinion is understated in risk analysis, especially in cyber risk where this topic in common frameworks is barely touched upon, if at all.

There may be instances in a quantitative risk analysis in which expert opinion is needed. For example, historical data on generalized ransomware payout rates is available, but an adjustment is needed for a particular sector. Another common application is eliciting experts when data is sparse, hard to come by, expensive, not available, or the analysis does not need precision. Supplementing data with the opinion of experts is an effective, and common, method. This technique is seen across many fields: engineering, medicine, oil and gas exploration, war planning - essentially, anywhere you have any degree of uncertainty in decision making, experts are utilized to generate, adjust or supplement data .

If asking one expert to make a forecast is good, asking many is better. This is achieved by gathering as many opinions as possible to include a diversity of opinion in the analysis. Once all the data is gathered, however, how does the analyst combine all the opinions to create one single input for use in the analysis? It turns out that there is not one single way to do this, and one method is not necessarily better than others. The problem of opinion aggregation has vexed scientists and others that rely on expert judgment, but after decades of research, the field is narrowed to several techniques with clear benefits and drawbacks to each.

The Two Methods: Behavioral and Mathematical

The two primary methods of combining the opinion of experts fall into two categories: behavioral and mathematical. Behavioral methods involve the facilitator working through the question with a group of experts until a consensus is reached. Methods vary from anonymous surveys, informal polling, group discussion and facilitated negotiation. The second major category, mathematical aggregation, involves asking experts an estimation of a value and using an equation to aggregate all opinions together.

Each category has its pros and cons, and the one the risk analyst chooses may depend on the analysis complexity, available resources, precision required in the analysis and whether or not the drawbacks of the method ultimately chosen are palatable to both the analyst and the decision maker.

Behavioral Methods

Combining expert estimates using behavioral methods span a wide range of techniques, but all have one thing in common: a facilitator interviews experts in a group setting and asks for estimations, justification, and reasoning. At the end of the session, the group (hopefully) reaches a consensus. The facilitator now has a single distribution that represents the opinion of a majority of the participants that can be used in a risk analysis.

An example of this would be asking experts for a forecast of future lost or stolen laptops for use in a risk analysis examining stronger endpoint controls. The facilitator gathers people from IT and Information Security departments, presents historical data (internal and external) about past incidents and asks for a forecast of future incidents. 

Most companies already employ some kind of aggregation of expert opinion in a group setting: think of the last time you were in a meeting and were asked to reach a consensus about a decision. If you have ever performed that task, you are familiar with this type of elicitation.

 The most common method is unstructured: gather people in a room, present research, and have a discussion. More structured frameworks exist that aim to reduce some of the cons listed below. The two most commonly used methods are the IDEA Protocol (Investigate, Discuss, Estimate, Aggregate) and some forms of the Delphi Method.

There are several pros and cons associated with the behavioral method.

Pros

  • Agreement on assumptions. The facilitator can quickly get the group using the same assumptions, definitions, and interpret the data in generally the same way. If one member of the group misunderstands a concept or misinterprets data, others in the group can help.

  • Corrects for some bias. If the discussion is structured (e.g., using the IDEA protocol), it allows the interviewer to identify some cognitive biases, such as the over/underconfidence effect, the availability heuristic and anchoring. A good facilitator uses the group discussion to minimize the effects of each in the final estimate.

  • Mathless. Group discussion and consensus building do not require an understanding of statistics or complex equations, which can be a factor for some companies. Some risk analysts may wish to avoid complex math equations if they, or their management, do not understand them.

  • Diversity of opinion: The group, and the facilitator, hears the argument of the minority opinion. Science is not majority rule. Those with the minority opinion can still be right.

  • Consensus: After the exercise, the group has an estimate that the majority agrees with.

Cons:

  • Prone to Bias: While this method controls for some bias, it introduces others. Unstructured elicitation sees bias creep in, such as groupthink, the bandwagon effect, and the halo effect. Participants will subconsciously, or even purposely, adopt the same opinions as their leader or manager. If not recognized by the facilitator, majority rule can quickly take over, drowning out minority opinion. Structured elicitation, such as the IDEA protocol which has individual polling away from the group as a component, can reduce these biases.

  • Requires participant time: This method may take up more participant time than math-based methods, which do not involve group discussion and consensus building.

  • Small groups: It may not be possible for a facilitator to handle large groups, such as 20 or more, and still expect to have a productive discussion and reach a consensus in a reasonable amount of time.

Mathematical Methods

The other method of combining expert judgment is math based. The methods all include some form of averaging, whether it's averaging all values in each quantile or creating a distribution from distributions. The most popular method of aggregating many distributions is the classical model developed by Roger Cooke. The classical model has extensive usage in many risk and uncertainty analysis disciplines, including health, public policy, bioscience, and climate change.

Simple averaging (e.g. mean, mode, median) in which all participants are weighted equally can be done in a few minutes in Excel. Other methods, such as the classical model, combines probabilistic opinions using a weighted linear average of individual distributions. The benefit to using the linear opinion pool method is that the facilitator can assign weights to different opinions. For example, one can weigh calibrated experts higher than non-calibrated ones. There are many tools that support this function, including two R packages: SHELF and expert.

As with the behavioral category, there are numerous pros and cons to using mathematical methods. The risk analyst must weigh each one to find the best that aids in the decision and risk analysis under consideration. 

Pros:

  • May be faster than consensus: The facilitator may find that math-based methods are quicker than group deliberation and discussion, which lasts until a consensus is reached or participants give up.

  • Large group: One can handle very large groups of experts. If the facilitator uses an online application to gather and aggregate opinion automatically, the number of participants is virtually limitless.

  • Math-based: Some find this a con, others find this a pro. While the data is generated from personal opinion, the results are math-based. For some audiences, this can be easier to defend.

  • Reduces some cognitive biases: Experts research the data and give their opinion separately from other experts and can be as anonymous as the facilitator wishes. Groupthink, majority rule, and other associated biases are significantly reduced. Research by Philip Tetlock in his 2016 book Superforecasters shows that if one has a large enough group, biases tend to cancel each other out – even if the participants are uncalibrated.

Cons

  • Different opinions may not be heard: Participants do not voice a differing opinion, offer different interpretations of data or present knowledge that the other experts may not have. Some of your “experts” may not be experts at all, and you would never know. The minority outlier opinion that may be right gets averaged in, and with a big enough group, gets lost.

  • Introduces other cognitive biases: If you have an incredibly overconfident group, forecasts that are right less often than the group expects are common. Some participants might let anchoring, the availability heuristic or gambler's fallacy influence their forecasts. Aggregation rolls these biases into one incorrect number. (Again, this may be controlled for by increasing the pool size.)

  • Complex math: Some of the more complex methods may be out of reach for some risk departments.

  • No consensus: It’s possible that the result is a forecast that no one agrees with. For example, if you ask a group of experts to forecast the number of laptops the company will lose next year, and experts return the following most likely values of: 22, 30, 52, 19 and 32. The median of this group of estimations is 30 – a number that more than half of the participants disagree with.

Which do I use?

As mentioned at the beginning of this post, there is not one method that all experts agree upon. You don’t have to choose just one – you may decide to use informal verbal elicitation for a low-precision analysis, and you have access to a handful of experts. The next week, you may choose to use a math-based method for an analysis in which a multi-million dollar decision is at stake, and you have access to all employees in several departments.

Deciding which one to use has many factors that vary from the facilitator’s comfort level with the techniques, the number and expertise of the experts, the geographic locations of the participants (e.g., are they spread out across the globe, or all work in the same building) and many others.

 Here are a few guidelines to help you choose:

Behavioral methods work best when:

  • You have a small group, and it’s not feasible to gather more participants

  • You do not want to lose outlier numbers in averaging

  • Reaching a consensus is a goal in your risk analysis (it may not always be)

  • The question itself is ambiguous and/or the data can be interpreted differently by different people

  • You don’t understand the equations behind the math-based techniques and may have a hard time defending the analysis

Math-based methods work best when:

  • You have a large group of experts

  • You need to go fast

  • You don’t have outlier opinion, or you have accounted for these in a different way

  • You just need the opinion of experts – you do not need to reach a consensus

  • The question is focused, unambiguous and the data doesn’t leave much room for interpretation

Conclusion

We all perform some kind of expert judgement elicitation, even if its informal and unstructured. Several methods of aggregation exist and are in wide use across many disciplines where uncertainty is high or data is hard to obtain. However, aggregation should never be the end of your risk analysis. Use the analysis results to guide future data collection and future decisions, such as levels of precision and frequency of re-analysis.

Stay tuned for more posts on this subject, including a breakdown of techniques with examples.

Reading List

 Expert Judgement

 Cognitive Bias

Behavioral Aggregation Methods

Mathematical Methods

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Should I buy mobile phone insurance? A Quantitative Risk Analysis

Should you buy mobile phone insurance, or are you better off self-insuring? In this post, I run a full FAIR-based quantitative risk analysis using real-world data, Monte Carlo simulations, and cost comparisons to decide if Verizon's Total Mobile Protection is worth the price.

broken-iphone-6.jpg

Should I buy mobile phone insurance?

A Quantitative Risk Analysis

I am always losing or damaging my mobile phone. I have two small children, so my damage statistics would be familiar to parents and shocking to those without kids. Over the last 5 years I've lost my phone, cracked the screen several times, had it dunked in water (don't ask me where), and several other mishaps. The costs definitely started to add up over time. When it was time to re-up my contract with my mobile phone provider, Verizon, I decided to consider an upgraded type of insurance called Total Mobile Protection. The insurance covers events such as lost/stolen devices, cracked screens, and out-of-warranty problems. 

The insurance is $13 a month or $156 a year, as well as a replacement deductible that ranges from $19 to $199, depending on the model and age of the device. The best way to determine if insurance is worth the cost, in this instance, is to perform a quantitative risk analysis. A qualitative analysis using adjectives like "red" or "super high" does not provide the right information to make a useful comparison between the level of risk versus the additional cost of insurance. If a high/medium/low scale isn't good enough to understand risk on a $600 iPhone, it shouldn't be good enough for your company to make important decisions.

To get started, I need two analyses: one that ascertains the current risk exposure without insurance, and another that forecasts potential risk exposure through partial risk treatment via transference (e.g. insurance). I’ll use FAIR (Factor Analysis of Information Risk) to perform the risk analysis because it’s extensible, flexible and easy to use.

The power and flexibility of the FAIR methodology and ontology really shines when you step outside cyber risk analyses. In my day job, I've performed all sorts of analyses from regulatory risk to reputation risk caused by malicious insiders, and just about everything in between. However, I've also used FAIR to help make better decisions in my personal life when there was some degree of uncertainty.  For example, I did an analysis a few years back on whether to sell my house, a 1879 Victorian home, or if I should sink money into a bevy of repairs and upgrades. 

Insurance is also a favorite topic of mine: does my annualized risk exposure of a loss event justify the cost of an insurance policy? I've performed this type of analysis on extended auto insurance coverage, umbrella insurance, travel insurance and most recently, mobile phone insurance – the focus of this post. Quantitative risk analysis is a very useful tool to help decision makers understand the costs and the benefit of their decisions under uncertainty.

This particular risk analysis is comprised of the following steps:

  • Articulate the decision we want to make

  • Scope the analysis

  • Gather data

  • Perform analysis #1: Risk without insurance

  • Perform analysis #2: Risk with insurance

  • Comparison and decision

Step 1: What’s the Decision?

The first step of any focused and informative risk analysis is identifying the decision.  Framing the analysis, in the form of reducing uncertainty, when making a decision eliminates several problems: analysis paralysis, over-decomposition, confusing probability and possibility, and more.

Here’s my question: 

Should I buy Verizon’s Total Protection insurance plan that covers the following: lost and stolen iPhones, accidental damage, water damage, and cracked screens?

All subsequent work from here on out must support the decision that answers this question.

Step 2: Scope the Analysis

Failing to scope out a risk assessment thoroughly creates problems later on, such as over-decomposition and including portions of the ontology that are not needed. Failing to properly scope a risk analysis upfront often leads to doing more work than is necessary.

Fig. 1: Assessment scope

Fig. 1: Assessment scope

Asset at risk: The asset I want to analyze is the physical mobile phone, which is an iPhone 8, 64GB presently. 

Threat community: Several threat communities can be scoped. From my kids, to myself, to thieves that may steal my phone, either by taking it from me directly or not returning my phone to me should I happen to leave it somewhere.

Go back to the decision we are trying to make and think about the insurance we are considering. The insurance policy doesn’t care how or why the phone was damaged, or if it was lost or stolen. Therefore, scoping in different threat communities into the assessment is over-decomposition.

Threat effect: Good information security professionals would point out the treasure trove of data that’s on a typical phone, and in many cases, is more valuable than the price of the phone itself. They are right. 

However, Verizon's mobile phone insurance doesn't cover the loss of data. It only covers the physical phone. Scoping in data loss or tampering (confidentiality and integrity threat effects) is not relevant in this case and is over-scoping the analysis. 

Step 3: Gather Data

Let’s gather all the data we have. I have solid historical loss data, which fits to the Loss Event Frequency portion of the FAIR ontology. I know how much each incident cost me, which is in the Replacement cost category, as a Primary Loss.

Fig 2: Loss and cost data from past incidents

Fig 2: Loss and cost data from past incidents

After gathering our data and fitting it to the ontology, we can make several assertions about the scoping portion of the analysis:

  • We don’t need to go further down the ontology to perform a meaningful analysis that aids the decision.

  • The data we have is sufficient – we don’t need to gather external data on the average occurrence of mobile device loss or damage. See the concept of the value of information for more on this.

  • Secondary loss is not relevant in this analysis.

(I hope readers by now see the necessity in forming an analysis around a decision – every step of the pre-analysis has removed items from the scope, which reduces work and can improve accuracy.)

Fig 3: Areas of the FAIR ontology scoped into this assessment, shown in green

Fig 3: Areas of the FAIR ontology scoped into this assessment, shown in green

Keep in mind that you do not need to use all portions of the FAIR ontology; only go as far down as you absolutely need to, and no further. 

Step 4: Perform analysis #1, Risk without insurance

The first analysis we are going to perform is the current risk exposure, without mobile phone insurance. Data has been collected (Fig. 2) and we know where in the FAIR ontology it fits (Fig. 3); Loss Event Frequency and the Replacement portion of Primary Loss. To perform this analysis, I’m going to use the free FAIR-U application, available from RiskLens for non-commercial purposes.

Loss Event Frequency

Refer back to Fig 2. It’s possible that I could have a very good year, such 2018 with 0 loss events so far. On a bad year, I had 2 loss events. I don’t believe I would exceed 2 loss events per year. I will use these inputs for the Min, Most Likely, and Max and set the Confidence at High (this adjusts the curve shape aka Kurtosis) because I have good, historical loss data that only needed a slight adjustment from a Subject Matter Expert (me).

Primary Loss

Forecasting Primary Loss is a little trickier. One could take the minimum loss from a year, $0, the maximum loss, $600, then average everything out for the Most Likely number. However, this method does not accurately capture the full range of what could go wrong in any given year. To get a better forecast, we'll take the objective loss data, give it to a Subject Matter Expert (me) and ask for adjustments.

The minimum loss cost is always going to be $0. The maximum, worst-case scenario is going to be two lost or stolen devices in one year. I reason that it's entirely possible to have two loss events in one year, and it did happen in 2014. Loss events range from a cracked screen to a full device replacement. The worst-case scenario is $1,200 in replacement device costs in one year. The Most Likely scenario can be approached in a few different ways, but I'll choose to take approximately five years of cost data and find the mean, which is $294.

Let’s take the data, plug it onto FAIR-U and run the analysis.

Risk Analysis Results

Fig 4. Risk analysis #1 results

Fig 4. Risk analysis #1 results

FAIR-U uses the Monte Carlo technique to simulate hundreds of years’ worth of scenarios, based on the data we input and confidence levels, to provide the analysis below.

Here's a Loss Exceedance curve; one of many ways to visualize risk analysis results.

Fig 5: Analysis #1 results in a Loss Exceedance Curve

Fig 5: Analysis #1 results in a Loss Exceedance Curve

Step 5: Perform analysis #2: Risk with insurance

The cost of insurance is $156 a year plus the deductible, ranging from $19 to $199, depending on the type, age of the device, and the level of damage. Note that Verizon's $19 deductible is probably for an old-school flip-phone. The cheapest deductible is $29 for an iPhone 8 screen replacement.  The worst-case scenario – two lost/stolen devices – is $554 ($156 for insurance plus $199 * 2 for deductible). Insurance plus the average cost of deductibles is $221 a year. Using the same data from the first analysis, I've constructed the table below which projects my costs with the same loss data, but with insurance. This lets me compare the two scenarios and decide the best course of action.

Fig 6: Projected loss and cost data with insurance

Fig 6: Projected loss and cost data with insurance

Loss Event Frequency

I will use the same numbers as the previous analysis. Insurance, as a risk treatment or a mitigating control, influences the Loss Magnitude side of the equation but not Loss Event Frequency.

Primary Loss

To be consistent, I’ll use the same methodology to forecast losses as the previous analysis. 

The minimum loss cost is always going to be $0. The maximum, worst-case scenario is going to be two lost or stolen devices in one year, at $554 ($156 insurance, plus $398 in deductibles.)

 Most Likely cost is derived from the mean of five years of cost data, which is $221.

Risk Analysis Results

Fig 7: Risk analysis #2 results

Fig 7: Risk analysis #2 results


The second analysis provides a clear picture of what my forecasted losses are.

Visualizing the analysis in a Loss Exceedance Curve:

Fig 8: Analysis #2 results in a Loss Exceedance Curve

Fig 8: Analysis #2 results in a Loss Exceedance Curve

Comparison

Without insurance, my average risk exposure is $353, and with insurance, it's $233. The analysis has provided me with useful information to make meaningful comparisons between risk treatment options.

Decision

I went ahead and purchased the insurance on my phone, knowing that I should rerun the analysis in a year. Insurance is barely a good deal for an average year, yet seems like a great value at protecting me during bad years. I also noted that my perceived “value” from insurance is heavily influenced by the fact that I experience a total loss of phones at a higher rate than most people. I may find that as my kids get older, I’ll experience fewer loss events.

I hope readers are able to get some ideas for their own quantitative analysis. The number one takeaway from this should be that some degree of decision analysis needs to be considered during the scoping phase.

Further Analysis

There many ways that this analysis can be extended by going deeper into the FAIR ontology to answer different questions, such as:

  • Does the cost of upgrading to an iPhone XS reduce the loss event frequency? (The iPhone XS is more water resistant than the iPhone 8)

  • Can we forecast a reduction in Threat Capability as the kids get older?

  • Can we find the optimal set of controls that provide the best reduction in loss frequency? For example, screen protectors and cases of varying thickness and water resistance. (Note that I don't actually like screen protectors or cases, so I would also want to measure the utility of such controls and weigh it with a reduction in loss exposure.)

  • If my average loss events per year continues to decrease, at what point does mobile phone insurance cease to be a good value?

Any questions or feedback? Let's continue the conversation in the comments below.

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Book Chapter: Cyber Risk Quantification of Financial Technology

Fintech is revolutionizing finance, but it’s also rewriting the rulebook for cybersecurity and risk management. In this chapter from Fintech: Growth and Deregulation, I explore how quantitative methods like FAIR can help risk managers keep up with blockchain, decentralized trust models, and emerging threats—without falling back on outdated controls or red/yellow/green guesswork.

Screen Shot 2018-11-21 at 9.38.45 PM.png

In February 2018, I wrote a chapter in a Risk.net book, titled Fintech: Growth and Deregulation. The book is edited by Jack Freund, who most of you will recognize as the co-author of Measuring and Managing Information Risk.

I happy to announce that I’m now able to re-post my book chapter, titled “Cyber-risk Quantification of Financial Technology” here. If you are interested in blockchain tech, Fintech, risk quantification and emerging risks, you may find it interesting. It’s also a primer to Factor Analysis of Information Risk (FAIR), one of many risk quantification models. It’s not the only one I use, but the one I use most frequently.

I covered the main ideas at the PRMIA Risk Management event in a talk titled Cybersecurity Aspects of Blockchain and Cryptocurrency (slides available in the link.)

You can buy the book here.

Hope you enjoy — and as always, if you have questions, comments or just want to discuss, drop me a line.


Chapter 13: Cyber Risk Quantification of Financial Technology

By Tony Martin-Vegue …from Fintech: Growth and Deregulation

Edited by Diane Maurice, Jack Freund and David Fairman
Published by Risk Books. Reprinted with permission.

Introduction

Cyber risk analysis in the financial services sector is finally catching up with its older cousins in financial and insurance risk. Quantitative risk assessment methodologies, such as Factor Analysis of Information Risk (FAIR), are steadily gaining traction among information security and technology risk departments and the slow, but steady, adoption of analysis methods that stand up to scrutiny means cyber risk quantification is truly at a tipping point. The heat map, risk matrix and “red/yellow/green” as risk communication tools are being recognized as flawed and it truly couldn’t come at a better time. The field’s next big challenge is on the horizon: the convergence of financial services and rapidly evolving technologies – otherwise known as Fintech – and the risks associated with it.

Fintech has, in many ways, lived up to the hype of providing financial services in a way that traditional firms have found too expensive, too risky, insecure or cost prohibitive. In addition, many Fintech firms have been able to compete with traditional financial services by offering better products, quicker delivery and much higher customer satisfaction. The rapid fusion of technology with financial services also signals a paradigm shift for risk managers. Many of the old rules for protecting the confidentiality, integrity and availability of information assets are being upended and the best example of this is how the defence-in-depth model is an outdated paradigm in some situations. For decades, sound security practices dictated placing perimeter defences, such as firewalls and intrusion detection systems around assets like a moat of water surrounding a castle; iron gates stopping intruders from getting in with defenders on the inside, at the ready to drop hot oil. This metaphor made sense when assets were deployed in this way; database server locked in a server rack in a datacentre, surrounded by a ring of protective controls. 

In the mid-2000’s, cloud computing technologies became a household name and risk managers quickly realized that the old defensive paradigms no longer applied. If cloud computing blurs the line between assets and the network perimeter, technologies used in Fintech, such as blockchain, completely obliterate it. Risk managers adapted to new defensive paradigms in 2006, and the same must be done now. For example, the very notion of where data is stored is changing. In the defence-in-depth model, first line, second line and third line defenders worked under the assumption that we want to keep a database away from the attackers, and accomplish this using trust models such as role-based access control. New models are being deployed in Fintech in which we actively and deliberately give data and databases to all, including potential attackers, and rely on a radically different trust model, such as the distributed ledger, to ensure the confidentiality and integrity of data.

The distributed ledger and other emerging technologies in Fintech do not pose more or less inherent risk than other technologies, but risk managers must adapt to these new trust and perimeter paradigms to effectively assess risk. Many firms in financial services are looking to implement these types of technologies so that business can be conducted faster for the customer, cheaper to implement and maintain, and increase the security posture of the platform. If risk managers approach emerging technologies with the same defence-in-depth mentality as they would a client-server model, they have the possibility of producing a risk analysis that drastically overstates or understates risk. Ultimately, the objective of a risk assessment is to inform decisions, therefore we must fully understand the risks and the benefits of some of these new technologies emerging in Fintech, or it may be hard to realize the rewards.

This chapter will explore emerging risks, new technologies and risk quantification in the Fintech sector with the objective of achieving better decisions – and continuing to stay one step ahead of the behemoth banks. The threat landscape is evolving just as fast – and sometimes faster – than the underlying technologies and control environments. The best way to articulate risk in this sector is through risk quantification. Not only is risk quantification mathematically sounder than the softer, qualitative risk methodologies, but it enables management to perform cost-benefit analysis of control implementation and reporting of risk exposure in dollars, euros or pounds, which our counterparts in financial and insurance risk are already doing. The end result is an assessment that prioritises risk in a focused, defensible and actionable way.

Emerging Risks in Fintech

History has demonstrated repeatedly that new innovations breed a new set of criminals, eager to take advantage of emerging technologies. From lightning-fast micropayments to cryptocurrency, some companies that operate in the Fintech sector are encountering a renaissance in criminal activity that is reminiscent of the crime wave Depression-era outlaws perpetrated against traditional banks. Add an ambiguous regulatory environment and it’s clear that risk managers will be on the front line of driving well-informed business decisions to respond to these threats.

Financial services firms are at the forefront of exploring these emerging technologies. An example of this is blockchain technology designed to enable near real-time payments anywhere in the world. UBS and IBM developed a blockchain payment system dubbed “Batavia” and many participants have signed on, including Barclays, Credit Suisse, Canadian Imperial Bank of Commerce, HSBC, MUFG, State Street (Reuters 2017), Bank of Montreal, Caixabank, Erste Bank and Commerzbank(Arnold 2017). The consortium is expected to launch its first product in late 2018. Other financial services firms are also exploring similar projects. Banks and other firms find blockchain technology compelling because it helps improve transaction speed, security and increase transparency, hence strengthening customer trust. Financial regulators are also looking at the same technologies with a watchful eye; regulators from China, Europe and the United States are exploring new guidance and regulations to govern these technologies. Banks and regulators are understandably cautious. This is uncharted territory and the wide variety of possible risks are not fully known. This trepidation is justified as there have been several high-profile hacks in the banking sector, blockchain and Bitcoin.

Some emerging risks in Fintech are:

Bank Heists, with a new spin

The Society for Worldwide Interbank Financial Telecommunication, also known as SWIFT, provides a secure messaging for over 11,000 financial institutions worldwide(Society of Worldwide Interbank Financial Telecommunication 2017), providing a system to send payment orders. SWIFT, in today’s context, would be considered by many to be anything but “Fintech” – but taking into account it was developed in 1973 and considering the system was, up until recently, thought of as being very secure – it is a stellar example of financial technology. 

In 2015 and 2016 dozens of banks encountered cyberattacks that lead to massive theft of funds, including a highly-publicised incident in which $81 million USD was stolen from a Bangladesh bank (Corkery 2016). The attacks were sophisticated and used a combination of compromised employee credentials, malware and a poor control environment (Zetter 2016)to steal the funds in a matter of hours. Later analysis revealed a link between the SWIFT hack and a shadowy hacking group, dubbed by the FBI as “Lazarus.” Lazarus is also suspected in the 2014 Sony Pictures Entertainment hack. Both hacks and Lazarus have been linked to North Korea government (Shen 2016). If attribution to North Korea is true, is it the first known instance in which a nation-state actor has stolen funds from a financial institution with a cyberattack. Nation-state actors, in the context of threat modelling and risk analysis, are considered to be very well-resourced, sophisticated, trained and operate outside the rule of law that may deter run-of-the-mill cybercriminals. As such, assume that nation-states can overcome any set of controls that are put in place to protect funds and data. State-sponsored attacks against civilian targets is a concerning escalation and should be followed and studied closely by any risk manager in Fintech. The SWIFT hacks are an example of how weaknesses in payment systems can be exploited again and again. The underlying SWIFT infrastructure is also a good case study in how Fintech can improve weak security in payment systems.

Forgetting the Fundamentals

Fintech bank heists aren’t limited to technology first developed in 1973, however. Take the case of the Mt. Gox Bitcoin heist: one of the first and largest Bitcoin exchanges at the time had 850,000 Bitcoin stolen in one day. At the time of the theft, the cryptocurrency was valued at $450 million USD. As of October 2017, the value of 850,000 Bitcoin is $3.6 trillion USD. How did this happen? Details are still murky, but the ex-CEO of Mt. Gox blamed hackers for the loss, others blamed the CEO, Mark Karpeles; the CEO even did time in a Japanese jail for embezzlement (O'Neill 2017). There were other issues, however: according to a 2014 story in Wired Magazine, ex-employees described a company in which there was no code control, no test code environment and only one person that could deploy code to the production site: the CEO himself, Mark Karpeles. Security fixes were often deployed weeks after they were developed (McMillian 2014). Fintech’s primary competitive advantage is that they have less friction than traditional financial services, therefore are able to innovate and push products to market very quickly. The downside the Mt. Gox case proves is when moving quickly, one cannot forget the fundamentals. Fundamentals, such as code change/version control, segregation of duties and prioritizing security patches should not be set aside in favour of moving quickly. Risk managers need to be aware of and apply these fundaments to any risk analysis and also consider that what makes technologies so appealing, such as the difficulty in tracing cryptocurrency, is also a new, emerging risk. It took years for investigators to locate the stolen Mt. Gox Bitcoin, and even now, there’s little governments or victims can do to recover them.

Uncertain regulatory environment

Fintech encompasses many technologies and many products, and as such, is subject to different types of regulatory scrutiny that vary by jurisdiction. One example of this ambiguous regulatory environment is the special Fintech Charter being considered by the Comptroller of the Currency (OCC), of the banking regulator in the United States. The charter will allow some firms to offer financial products and services without the regulatory requirements associated with a banking charter (Merken 2017). This may be desirable for some firms, as it will offer a feeling of legitimacy to customers, shareholders and investors. However, other firms may see this as another regulatory burden that stifles innovation and speed. Additionally, some firms that would like to have a Fintech charter may not have the internal IT governance structure in place to consistently comply with requirements. This could also result in future risk; loss of market share, regulatory fines and judgements and bad publicity due to a weak internal control environment.

It is beyond the scope of this chapter to convince the reader to adopt a quantitative risk assessment methodology such as Factor Analysis of Information Risk (FAIR), however, consider this: in addition to Fintech Charters, the OCC also released an “Advanced Notice of Proposed Rulemaking” on Enhanced Cyber Risk Management Standards. The need for improved cyber risk management was argued in the document, and FAIR Institute’s Factor Analysis of Information Risk standard and Carnegie Mellon’s Goal-Question-Indicator-Metric process are specifically mentioned (Office of the Comptroller of the Currency 2016). Risk managers in Fintech should explore these methodologies if their firm has a banking charter, may receive a special-purpose Fintech charter or are a service provider for a firm that has a charter.

Poor risk management techniques

We’re an emerging threat.

As mentioned many times previously, technology is rapidly evolving and so is the threat landscape. Practices, such as an ambiguous network perimeter and distributed public databases were once unthinkable security practices. They are now considered sound and, in many cases, superior methods to protect the confidentiality, integrity and availability of assets. Risk managers must adapt to these new paradigms and use better tools and techniques of assessing and reporting risk. If we fail to do so, our companies will not be able to make informed strategic decisions. One of these methods is risk quantification.

Case Study #1: Assessing risk of Blockchain ledgers

Consider a start-up payments company that is grappling with several issues: payments are taking days to clear instead of minutes; fraud on the platform exceeds their peers; and, a well-publicised security incident several years prior has eroded public trust.

Company leaders have started conversations around replacing the traditional relational database model with blockchain-based technology. Blockchain offers much faster payments, reduces the firm’s foreign exchange risk, helps the business improve compliance with Know Your Customer (KYC) laws, and reduces software costs. Management has requested a risk assessment on the different operating models of the blockchain ledger, expecting enough data to perform a cost-benefit analysis.

After carefully scoping the analysis, three distinct options have been identified the firm can take:

  • Stay with the current client-server database model. This does not solve any of management’s problems, but does not expose the company to any new risk either.

  • Migrate the company’s payments system to a shared public ledger. The trust model completely changes: anyone can participate in transactions, as long as 51% of other participants agree to the transaction (51% principle). Over time, customer perceptions may improve due to the total transparency of transactions, however, the question of securing non-public personal information (NPI) needs to be examined. Furthermore, by making a payments system available to the public that anyone can participate in, the firm may be reducing their own market share and a competitive differentiator needs to be identified.

  • The firm can adopt a private blockchain model: participation by invitation only, and in this case, only other payments companies and service providers can participate. This is a hybrid approach: the firm is moving from a traditional database to a distributed database, and the trust model can still be based on the 51% principle, but participation still requires authentication, and credentials can be compromised. Additionally, in some implementations, the blockchain can have an “owner” and owners can tamper with the blockchain.

It’s clear that this is not going to be an easy risk assessment, and the risk managers involved must do several things before proceeding. This is a pivotal moment for the company and make-or-break decisions will be based on the analysis, so red/yellow/green isn’t going to be sufficient. Second, traditional concepts such as defence-in-depth and how trust is established are being upset and adaptability is key. The current list of controls the company has may not be applicable here, but that does not mean the confidentiality, integrity and availability of data is not being protected.

Applying Risk Quantification to Fintech

Assessing risk in financial service, and in particular, Fintech, requires extra rigor. As a result, quantitative risk assessment techniques are being discussed in the cyber risk field. This chapter focuses on the Fair Institute’s Factor Analysis of Information Risk because it is in use by many financial intuitions world-wide, has many resources available to aid in implementation and is cited by regulators and used by financial institutions as a sound methodology for quantifying cyber risk (Freund & Jones, 2015). It’s assumed that readers do not need a tutorial on risk assessment, risk quantification or even FAIR; this section will walk through a traditional FAIR-based quantitative risk analysis that many readers are already familiar with and specifically highlight the areas where Fintech risk managers may need to be aware of, such as unique, emerging threats and technologies.

In FAIR, there are four distinct phases of an assessment: scoping the assessment, performing risk analysis, determining risk treatment and risk communication (Josey & et al, 2014). Each are equally important and have special considerations when assessing risk in Fintech. 

Scoping out the assessment

Scoping is critical to lay a solid foundation for a risk assessment and saves countless hours during the analysis phase. An improperly scoped analysis may lead to examining the wrong variables or spending too much time performing an analysis, which is a common pitfall many risk managers make. Focus on the probable, not the possible (possibilities are infinite – is it possible that an alien invasion can affect the availability of your customer database by vaporizing your datacentre?)

Scoping is broken down into four steps: identifying the asset(s) at risk, identifying the threat agent(s) that can act against the identified assets, describe the motivation, and lastly, identify the effect the agent has on business objectives. See Figure 1 for a diagram of the process.

Figure 1: Scoping an Assessment

Figure 1: Scoping an Assessment

Step 1: Identify the asset(s) at risk

Broadly speaking, an asset in the cybersecurity sense is anything that is of value to the firm. Traditionally, hardware assets, such as firewalls, servers and routers are included in every risk analysis, but in Fintech – where much of the services provided are cloud-based and on virtual hardware, uptime/availability is an additional metric. Downtime of critical services can almost always be measured in currency.  There are several other assets to consider: money (e.g. customer funds), information assets (e.g. non-public personal information about the customer) and people. People, as an asset, are almost always overlooked but should be included for the sake of modelling threats and designing controls, both of which can impact human life and safety. Keep in mind that each asset requires a separate analysis, so scope in only the elements required to perform an analysis.

Understanding emerging technologies that enable Fintech is a crucial part of a risk managers job. It’s relatively easy to identify the asset – what has value to a company – when thinking about the client-server model, datacentres, databases and currency transactions. This becomes difficult when assets are less tangible, such as a database operating under the client-server model, running on a physical piece of hardware. Less tangible assets are what we will continue to find in Fintech, such as, data created by artificial intelligence, distributed public ledgers and digital identities.

Step 2: Threat Agent Identification

Risk managers, in most cases, will need to break down threat agents further than shown in Figure 1, but the basic attributes that all threat agents possess are illustrated. More detail is given in the “Threat Capability” portion of this section.

All risk must have a threat. Think of a very old house that was built in 1880’s. It has a crumbling, brick foundation sitting on top of sandy dirt. Load-bearing beams are not connected to the ground. In other words, this house will fall like a house of cards if a strong earthquake were to hit the area. Some analysts would consider this a significant risk, and immediately recommend mitigating controls: replace the brick foundation with reinforced concrete, bolt the house to the new foundation and install additional vertical posts to load-bearing beams. 

These controls are very effective at reducing the risk, but there is an important data point that the analyst hasn’t asked: What is the threat?

The house is in the US state of Florida, which is tied with North Dakota as having the fewest number of earthquakes in the continental US (USGS n.d.), therefore other sources of threat need to be investigated. 

Without identifying the threat agent before starting a risk analysis, one may go through a significant amount of work just to find there isn’t a credible threat, therefore no risk. Even worse, the analyst may recommend costly mitigating controls, such as an earthquake retrofit, when protection from hurricanes is most appropriate in this situation.

There are generally two steps when identifying threat agents: 1) use internal and external incident data to develop a list of threats and their objectives, and 2) analyse those threats to ascertain which ones pose a risk to Fintech firms and how the threat agents may achieve their objectives.

Firms in the Fintech sector have many of the same threat agents as those that operate in Financial Services, with a twist: as the portmanteau suggests, Fintech firms often have threat agents that have traditionally targeted financial services firms, such as cybercriminal groups. Cyber criminals have a vast array of methods, resources and targets and are almost always motivated by financial gain. Financial services firms have also been targeted in the past by hacktivist groups, such as Anonymous. Groups such as this are motivated by ideology; in the case of Anonymous, one of their (many) stated goals was disruption of the global financial systems, which they viewed as corrupt. Distributed Denial of Service (DDoS) attacks are used to disrupt the availability of customer-facing websites, with some effect, but ultimately fail to force banks to enact any policy changes (Goldman 2016). Technology firms are also victims of cybercrime attacks, but unlike financial institutions, many have not drawn the ire of hacktivists. Depending on the type of technology a firm develops, they may be at an increased threat of phishing attacks from external sources and intellectual property theft from both internal and external threats.

Step 3: Describe the Motivation

The motivation of the threat actor plays is a crucial part in scoping out an analysis, and also helps risk managers in Fintech include agents that are traditionally not in a cyber risk assessment. For example, malicious agents include hostile nation-states, cybercriminals, disgruntled employees and hacktivists. As mentioned in the Emerging Risks in Fintech section earlier, risk managers would be remiss to not include Government Regulators as a threat agent. Consider Dwolla; the control environment was probably considered “good enough” and they did not suffer any loss events in the past due to inadequate security. However, government regulators caused a loss event for the company in the form of a fine, costly security projects to comply with the judgement and bad publicity. Additionally, consider accidental/non-malicious loss events originating from partners and third-party vendors, as many Fintech firms heavily rely on cloud-based service providers.

Step 4: Effect

Some things don’t change: security fundamentals are as applicable today as they were decades ago. Using the CIA Triad (confidentiality, integrity, availability) helps risk managers understand the form a loss event takes and how it affects assets. Threat agents act against an asset with a particular motivation, objective and intent. Walking through these scenarios – and understanding threat agents – helps one understand what the effect is.

Think about a threat agents’ goals, motivations and objectives when determining the effect. Hacktivists, for example, are usually groups of people united by political ideology or a cause. Distributed Denial of Service (DDoS) attacks have been used in the past to cause website outages while demands are issued to the company. In this case, the risk manager should scope in Availability as an effect, but not Confidentiality and Integrity. 

Lastly: Writing Good Risk Statements

The end result is a well-formed risk statement that clearly describes what a loss event would look like to the organization. Risk statements should include all of the elements listed in steps 1-4 and describe the loss event, who the perpetrator is and what asset is being affected.

More importantly, the risk statement must always answer the question: What decision are we making? The purpose of a risk analysis is to reduce uncertainty when making a decision, therefore if at the end of scoping you don’t have a well-formed question that needs to be answered, you may need to revisit the scope, the purpose of the assessment or various sub-elements.

Case Study #2: Asset Identification in Fintech

A large bank has employed several emerging technologies to create competitive differentiators. The bank is moving to application programming interfaces (APIs) to move data to third parties instead of messaging (e.g. SWIFT). The bank is also employing a private blockchain and is innovating in the area of creating digital identities for their customers. A risk assessment of these areas requires inventive analysis to even complete the first step, asset identification.

When performing a risk analysis, the first question to ask is “What is the asset we’re protecting?” Besides the obvious, (money, equipment, data containing non-public personal information (NPI) firms that employ Fintech assets may often be less obvious. If the risk analyst is stuck, utilise information security fundamentals and break the problem down into smaller components that are simpler to analyse. In the case of the large bank employing new technologies, consider how the confidentiality, integrity and availability (CIA) can be affected if a loss event were to take place.

Confidentiality and integrity in a blockchain ledger can be affected if the underlying technology has a vulnerability. Blockchain technology was built from the ground up with security in mind using secret sharing; all the pieces that make up data are random and obfuscated. In a client-server model, an attacker needs to obtain a key to compromise encryption; with blockchains, an attacker needs to compromise the independent participant servers (depending on the implementation, this can be either 51% of servers or all the servers). The “asset” has shifted from something in a datacentre to something that is distributed and shared.

By design, blockchain technology improves availability.The distributed, decentralized nature of it makes it very resilient to outages. The asset in this case has also shifted; if uptime/availability is an asset due to lost customer transactions, this may not occur after the bank is done with the distributed ledger implementation. Risk may be overstated if this is not considered.

Case Study #3: Government regulators as a threat agent

In addition to the uncertain future with Fintech charters and the regulatory compliance risk it poses, the security practices of a US-based online payments platform start-up named Dwolla is also an interesting case study in regulatory action that results in a loss event. The Consumer Financial Protection Bureau (CFPB), a US-based government agency responsible for consumer protection, took action against Dwolla in 2016 for misrepresenting the company’s security practices. The CFPB found that “[Dwolla] failed to employ reasonable and appropriate measures to protect data obtained from consumers from unauthorized access” (United States Consumer Financial Protection Bureau 2016). 

The CFPB issued a consent order and ordered the firm to remediate security issues and pay a $100,000 fine(Consumer Financial Protection Bureau 2016). This was the first action of this kind taken by the CFPB, which was created in 2014 by the Dodd-Frank Wall Street Reform and Consumer Protection Act. More interestingly, however, is that that action was taken without harm. Dwolla did not have a data breach, loss of funds or any other security incident. The CFPB simply found what the company was claiming about their implemented security practices to be deceptive and harmful to consumers. Risk managers should always build regulatory action into their threat models and consider that regulatory action can originate from consumer protection agencies, not just banking regulators.

 Another interesting piece of information from the CFPB’s consent order is the discovery that “[Dwolla] failed to conduct adequate, regular risk assessments to identify reasonably foreseeable internal and external risks to consumers’ personal information, or to assess the safeguards in place to control those risks.” The risk of having incomplete or inadequate risk assessments should be in every risk manager’s threat list.

Performing the analysis

After the assessment is scoped, take the risk statement and walk through the FAIR taxonomy (figure 2), starting on the left.

Determine the Loss Event Frequency first which, in the FAIR taxonomy, is the frequency at which a loss event occurs. It is always articulated in the form of a period of time, such as “4x a month.” Advanced analysis includes a range, such as “Between 1x a month and 1x year.” This unlocks key features of FAIR that are not available in some other risk frameworks used in cyber risk: PERT distributions and Monte Carlo simulations. This allows the analyst to articulate risk in the form of ranges instead of a single number or colour (e.g. red.)

The Loss Event Frequency is also referred to as “Likelihood” in other risk frameworks. The Loss Event Frequency is a calculation of the Threat Event Frequency which is the frequency at which a threat agent acts against an asset, and the Vulnerability, which is a calculation of the assets’ ability to resist the threat agent. The Vulnerability calculation is another key differentiator of FAIR and will be covered in-depth shortly.

Loss Magnitude, sometimes called “Impact” in other risk frameworks, is the probable amount of loss that will be experienced after a loss event. The Loss Magnitude is comprised of Primary Loss, which is immediate losses, and Secondary Loss, which can be best described as fallout or ongoing, costs resulting from the loss event.

Figure 2: The FAIR taxonomy

Figure 2: The FAIR taxonomy

Step 1: Derive the Threat Event Frequency

The scoping portion of the assessment includes a fair amount of work on threat agent modelling, so it is easiest to start there with the analysis. With the threat agent identified, the next step is to ascertain the frequency the threat agent will act against our asset.

FAIR also utilises calibrated probability estimates. When dealing with possible future events, it is not possible to say with exact certainty the frequency of occurrence.  After all, we don’t have a crystal ball, nor do we need one. The purpose of a risk assessment is not to tell the future; it is to reduce uncertainty about a future decision. Calibrated probability estimates provide a way for subject matter experts to estimate probabilities while providing a means to express uncertainty. For example, a subject matter expert can state that a web application attack against a Fintech firm can occur between 1x a year and once every 5 years, with an 90% confidence interval. Confidence interval is a term used in statistics, meaning that the analyst is 90% certain the true answer falls within the range provided. Combining calibrated probability estimates with an analysis of past incidents, risk managers can be remarkably effective at forecasting a frequency of future threat events in a range. 

Calibrated probability estimates have been used successfully in other fields for decades. Weather forecasts, for example, use calibrated probability estimates when describing the chance of rain within a period of time. Risk managers working in Fintech will find this method very effective because we are asked to describe risks that may not have happened before. In this case, a calibrated probability estimate allows the risk manager to articulate their level of uncertainty about a future event.

Contact Frequency describes the number of times a threat agent comes into contact with an asset and the Probability of Action describes the probability the threat agent will act against the asset.

Step 2: Derive the Vulnerability

Vulnerability is made up of two components: threat capability and resistance strength. These two concepts are usually discussed and analysed separately, but they are so intertwined with each other, that it may be easier to understand them as relational and even symbiotic (figure 3).

 Threat Capability is a scale, between 1% and 100% given to a single agent in relation to the total population of threat agents that can cause loss events at your firm. The list of threat agents, often called a Threat Agent Library, can include everything from cyber criminals, nation states, hacktivists, natural disasters, untrained employees, government regulators and much more. Motivation, resources, objectives, organization and other attributes are considered when giving each agent a threat capability. The entire population of threat agents, with capability ratings, is called a threat continuum.

Resistance strength is also a percentage, between 1% and 100%, and is a way of measuring all the controls in place to protect an asset. The entire threat continuum is used as a benchmark to then give a range to how effective resistance strength is.

There are special considerations a Fintech risk manager must consider when assessing threat capability in a continuum and the corresponding resistance strength.

The threat landscape is constantly changing and evolving; think back to one of the first viruses that was distributed on the Internet, the Morris worm in 1988. A coding error in the virus turned something that was meant to be an experiment to measure the size of the Internet into a fast spreading worm that resulted in denial of service events on 10% of the Internet. Fast forward to today and the Morris worms seems quaint in retrospect. Militaries train cyber warriors for both offensive and defensive capabilities. Cyber-criminal organizations are highly resourced, develop their own tools, and have a very high level of sophistication. The CIA and NSA have developed offensive tools that, in many ways, outpace commercially available defensive products. In what is now called the Shadow Brokers leaks, those tools were made available to the public, giving threat actors a set of tools that give unprecedented offensive capabilities.

How does a risk manager measure and articulate a complex threat landscape that has the following attributes?

  • Nation states have vast resources, operate outside the law, develop exploits in where vendors do not have a patch for (zero day exploits), and launch offensive attacks at each other, resulting in collateral damage to firms.

  • Hostile nation states have attacked firms with the objective of damaging the company or stealing money.

  • Zero day exploits have been leaked and cyber-criminal organizations use them unhampered until vendors release a fix, which takes weeks or months.

  • Rewards for criminal activity have never been greater; monetizing stolen personal information from data breaches is easy and rarely results in legal repercussions

  • The threat landscape is a constant ebb and flow: there may be an elevated state of activity due to a hostile nation state launching attacks or exploits tools released into the wild. There may also be a period of relative calm, such as when most vendors release patches for Shadow Brokers exploits and firms have applied them.

Not all risk models include an assessment of threat capability, favouring an assessment of the control environment exclusively to determine the likelihood of a loss event. These models miss an important attribute of assessing cyber risk: the likelihood of a loss event growing and shrinking due to external forces, even if the control environment stays exactly the same. To understand this concept, one must understand the relationship between threat actors and controls.

A control is an activity that prevents or detects events that result in risk. Controls can be preventative, detective, corrective, deterrent, aid in recovery and compensating. Other disciplines, such as IT Audit, consider controls as something that operate in a vacuum: they are designed to perform a function, and they either operate effectively or they do not. For example, if we designed a flimsy door to be secured with a single lock on the door knob and tested the control, it would pass – as long as the door was locked. The threat actor (burglar with a strong leg to push the door in) is not considered. Control testing has its place in the enterprise, but is not effective at articulating risk.

Rather than thinking about controls by themselves, consider the entire control environment as the ability to resist threat agents. In fact, it is for this reason FAIR calls this portion of the risk assessment resistance strength – it’s a holistic view of an ability of an asset to resist the force of a threat agent.

Work with your threat teams to develop a threat actor library. It will help you scope out a risk assessment, is reusable, pre-loads much of the work upfront, therefore making risk assessments faster. Plot actors on a threat continuum diagram to make resistance strength identification easier and update at least quarterly.

Step 3: Derive Loss Magnitude

Understanding Loss Magnitude, the damage, expenses and harm resulting from an event, is often one of the easier portions of a risk analysis because other employees in a typical firm have thought about many of these expenses, although not in the context of cyber risk. Many risk frameworks refer to this step as “Impact.” Loss magnitude is made up of two components: Primary Loss, direct cost and damages, and Secondary loss, which is best thought of as “fallout” after an event. 

Some considerations for Fintech risk managers when determining the Loss Magnitude:

Productivity can be harmed if an event hampers revenue generation. Emerging technologies, such as artificial intelligence, highly resilient network and distributed ledgers can mitigate some of this risk, but risk may present itself in different ways. Business Continuity managers and department heads of product lines are good places to start ascertaining this.

Response costs can add up quickly managing a loss event, such as employing outside forensics, auditors, staff augmentation and legal consulting.

The cost of replacing an asset still exists, even with cloud computing and virtual machines that can be allocated in minutes. There may be costs involved for extra computing capacity or restoring from backup.

Fines and judgements may occur when regulatory agencies take action against the firm of or judgements from lawsuits from customers, shareholders or employees. The legal landscape can be understood by reading the SEC filings of similar firms, news reports and legal documents. Action in this category is mostly public and is easy to extrapolate to apply to a particular firm.

Competitive advantage describes loss of customer and/or revenue due to a diminished company position after a loss event. This takes many forms, including the inability to raise new capital, inability to raise debt financing and a reduction in stock price. Senior management may have this information and an estimate of the number of lost customers due to an event. 

Reputation damage resulting from a loss event can be difficult to quantify, but calibrated estimates can still be made. Focus on the tangible losses that can occur rather than “reputation.” For example, if in the long-term, perceptions about the company are negatively changed, this can result in a reduction in stock price, lenders viewing the company as a credit risk, reduction in market growth and difficulty in recruiting/retaining employees.  

Final Steps: Deriving, Reporting and Communicating Risk

The final steps in the risk assessment are beyond the scope of this chapter, which focuses on Fintech, emerging risks and special considerations for risk managers. The final risk is a calculation of the Loss Event Frequency and the Primary/Secondary Loss, and is articulated in the form of a local currency. It is in this phase that the risk manager works with stakeholders to identify additional mitigating controls, if applicable, and another analysis can be performed to determine the expected reduction in loss exposure. Risk reporting and communication is a crucial part of any analysis: stakeholders must receive the results of the analysis in a clear and easy to understand way so that informed decisions can be made. 

Case Study #4: Poor research skews Threat Event Frequency

Supplementing calibrated probability estimates with internal incident data and external research is an effective way to improve accuracy and control bias when conducting risk assessments, particularly the Threat Event Frequency portion of an analysis. 

A medium-sized London-based insurance firm is conducting cutting-edge research in the machine learning and cryptocurrency areas, with the hope that they will be able to offer more products at very competitive prices. Management is concerned with the threat of insiders (company employees, consultants and contractors) stealing this innovative work and selling it to competitors. The cyber risk management team is tasked with ascertaining the risk to the company and determine how the current security control environment mitigates this risk. After careful scenario scoping, the team proceeds to the Threat Event Frequency portion of the analysis and encounters the first problem.

The company hasn’t had any security events involving insiders, so internal historical data isn’t available to inform a calibrated probability estimate. Additionally, subject matter experts say they can’t provide a range of a frequency of threat agents acting against the asset, which is intellectual property, because they are not aware of an occurrence in the Fintech space. The cyber risk team has decided to cast a wider net and incorporate external research conducted by outside firms on insider threats and intellectual property theft and extrapolate the results and use it to inform the risk scenario under consideration. It is at this point that the risk team encounters their next problem: the research available is contradictory, sponsored by vendors offering products that mitigate insider threats and uses dubious methodology.

There is no better example of how poor research can skew risk analysis than how insider threats have been researched, analysed and reported. The risk managers at the insurance firm need to estimate the percentage of data breaches or security incidents caused by insiders and have found several sources.

  • The Clear Swift Insider Threat Index report reports that 74% of data breaches are caused by insiders (Clearswift 2017).

  • In contrast, the 2017 Verizon Data Breach Investigation Report puts the number at 25% (Verizon 2017).

  • The IBM Xforce 2016 Cyber Security Intelligence Index reports that 60% of data breaches are caused by insiders, but a non-standard definition of “insider” is used (IBM 2016). IBM considers a user clicking on a phishing email as the threat-source, whereas most threat models would consider the user the victim and sender/originator of the email as the threat agent. 

The lesson here is to carefully vet and normalize any data sources. Failure to do so could significant underreport or over report risk, leading to poor decisions. 

All analysis and research have some sort of bias and error, but risk managers need to be fully aware of it and control for it, when possible, when using it in risk assessments. Carefully vet and normalize any data sources - failure to do so could result in significantly underreporting or over reporting threat event frequency.

A good source of incident data is the Verizon Data Breach Investigations Report (DBIR). The DBIR uses real-world incident data from reported data breaches and partners that span sectors: government, private sector firms, education and many others. The DBIR uses statistical analysis to present information to the reader that can be easily consumed into risk analysis. Another great source of raw incident data is the Privacy Rights Clearinghouse, which maintains a database of data breaches in the United States. Basic analysis is performed, but risk managers can download all incident data into Microsoft Excel and run their own analysis. Simple analysis is useful, such as the number of data breaches in the last 5 years due to stolen equipment, and more sophisticated analysis can be run, such as Bayesian analysis to generate a probability distribution. 

Other security research is derived from Internet-based surveys and sometimes uses dubious methodologies often conducted without a notion of statistical sampling or survey science. Unless your risk analysis includes opinion about a population of people (many risk analyses can include this with great effectiveness!) it is best to read disclosures and research the methodology sections of reports to ascertain whether or not the research analysed a survey of respondents or actual incident data. The latter is almost always preferable when trying to determine frequency and probability of attacks and attack characteristics. 

Risk managers should proceed with extreme caution when quoting research based on surveys. The importance of vetting research cannot be overstated. 

Conclusion

Financial technology opens up new doors, in many ways. It enables disruption in a sector that is ripe for it and offers consumers more choices, often for cheaper and more securely. These new doors also require a shift in thinking for risk managers. Some of the old rules have changed and building fortress-like defensive security perimeters either don’t apply or hamper innovation. Conversely, some security fundamentals, such as the basics of how controls are applied and the security objectives of confidentiality, integrity and availability have not changed. 

While Fintech has diverged from and in many ways outpaced its parent industry, Finance, in consumer offerings, speed and innovation, it must be careful not to rely on the same security tools that its other parent, Technology, has traditionally relied on. In doing so Fintech risk will in effect remain in the dark ages. Risk managers in modern finance have relied on quantitative methods to analyse business risk for as long as the industry has existed, whereas Technology still largely relies on the “red/yellow/green” paradigm to discuss risk. Fintech risk managers have an opportunity to further the rigor and integrity of our profession by using quantitative methods fitting of our trade. The future - including technology, the regulatory environment and the sophistication of criminals - continues to evolve, so we must equip ourselves with the tools that will support us to keep pace. 

Quantitative risk assessments, such as FAIR, are how we are going to best serve our firms, analyse risk and advise on the best return on investment for security controls. 

Works Cited

Arnold, Martin. 2017. Banks team up with IBM in trade finance blockchain.4 October. Accessed October 6, 2017. https://www.ft.com/content/7dc8738c-a922-11e7-93c5-648314d2c72c.

Clearswift. 2017. Clearswift Insider Threat Index.Accessed October 1, 2017. http://pages.clearswift.com/rs/591-QHZ-135/images/Clearswift_Insider_Threat_Index_2015_US.pdf.

Consumer Financial Protection Bureau. 2016. CFPB Takes Action Against Dwolla for Misrepresenting Data Security Practices.2 March. https://www.consumerfinance.gov/about-us/newsroom/cfpb-takes-action-against-dwolla-for-misrepresenting-data-security-practices/.

Corkery, Michael. 2016. Once Again, Thieves Enter Swift Financial Network and Steal.12 May. Accessed June 27, 2017. https://www.nytimes.com/2016/05/13/business/dealbook/swift-global-bank-network-attack.html.

Freund, J., & Jones, J. (2015). Measuring and Managing Information Risk: A FAIR Approach.Walthan, MA, USA: Elsevier.

Goldman, David. 2016. Anonymous attacks Greek Central Bank and vows to take down more banks' sites.4 May. Accessed July 4, 2017. http://money.cnn.com/2016/05/04/technology/anonymous-greek-central-bank/index.html.

IBM. 2016. IBM Xforce 2016 Cyber Security Intelligence Index .Accessed May 5, 2017. https://www.ibm.com/security/data-breach/threat-intelligence-index.html.

Josey, A., & et al. (2014). The Open FAIR Body of Knowledge.Berkshire, UK: The Open Group.

McMillian, Robert. 2014. The Inside Story of Mt. Gox, Bitcoin's $460 Million Disaster.3 March. https://www.wired.com/2014/03/bitcoin-exchange/.

Mead, Rebecca. 2016. Learn Different.7 March. Accessed August 9, 2017. https://www.newyorker.com/magazine/2016/03/07/altschools-disrupted-education.

Merken, Sara. 2017. OCC Not Yet Ready to Offer Special Charters to Fintechs.03 September. Accessed September 14, 2017. https://www.bna.com/occ-not-yet-n57982087846/.

Office of the Comptroller of the Currency. 2016. Enhanced Cyber Risk Management Standards.Office of the Comptroller of the Currency, Washington, D.C.: United States Department of the Treasury.

O'Neill, Patrick Howeell. 2017. The curious case of the missing Mt. Gox bitcoin fortune.21 June. https://www.cyberscoop.com/bitcoin-mt-gox-chainalysis-elliptic/.

Reuters. 2017. Six big banks join blockchain digital cash settlement project.31 August. Accessed October 6, 2017. https://www.reuters.com/article/us-blockchain-banks/six-big-banks-join-blockchain-digital-cash-settlement-project-idUSKCN1BB0UA.

Shen, Lucinda. 2016. North Korea Has Been Linked to the SWIFT Bank Hacks.27 May. Accessed October 1, 2017. http://fortune.com/2016/05/27/north-korea-swift-hack/.

Society of Worldwide Interbank Financial Telecommunication. 2017. Introduction to SWIFT.Accessed October 1, 2017. https://www.swift.com/about-us/discover-swift?AKredir=true.

United States Consumer Financial Protection Bureau. 2016. Consent Order Dwolla Inc.02 March. Accessed July 02, 2017. http://files.consumerfinance.gov/f/201603_cfpb_consent-order-dwolla-inc.pdf.

USGS. n.d. USGS.Accessed September 1, 2017. https://earthquake.usgs.gov/learn/topics/megaqk_facts_fantasy.php.

Verizon. 2017. Verizon Data Breach Investigations Report.Accessed July 4, 2017. http://www.verizonenterprise.com/verizon-insights-lab/dbir/2017/.

Zetter, Kim. 2016. That Insane, $81M Bangledesh Bank Heist? Here's what we know.17 May. Accessed July 4, 2017. https://www.wired.com/2016/05/insane-81m-bangladesh-bank-heist-heres-know/.



Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Black Swans in Risk: Myth, Reality and Bad Metaphors

Think you understand Black Swan events? This post dismantles the myth, exposes how risk pros misuse the term, and introduces a better way to frame extreme risk.

cd5ad-1htkchb-3jnmslay0oy6pmw.jpeg

The term “Black Swan event” has been part of the risk management lexicon since its coinage in 2007 by Nassim Taleb in his eponymous book titled The Black Swan: The Impact of the Highly Improbable. Taleb uses the metaphor of the black swan to describe extreme outlier events that come as a surprise to the observer, and in hindsight, the observer rationalizes that they should have predicted it.

The metaphor is based on the old European assumption that all swans are white, until black swans were discovered in 1697 in Australia.

Russell Thomas recently spoke at SIRACon 2018 on this very subject in his presentation, “Think You Know Black Swans — Think Again.” In the talk, and associated blog post, Thomas deconstructs the metaphor and Taleb’s argument and expounds on the use and misuse of the term in modern risk management. One of the most illuminating areas of Thomas’ work is his observation that the term “Black Swan” is used in dual ways: both to dismiss probabilistic reasoning and to extend it to describe certain events in risk management that require extra explanation. In other words, Taleb’s definition of Black Swan is a condemnation of probabilistic reasoning, i.e., forecasting future events with some degree of certainty. The more pervasive definition is used to describe certain types of events within risk management, such as loss events commonly found in risk registers and heat maps in boardrooms across the globe. If it seems contradictory and confusing, it is.

From a purely practitioner point of view, it’s worth examining why the term Black Swan is used so often in risk management. It’s not because we’re trying to engage in a philosophical discussion about the unpredictability of tail risks, but rather that risk managers feel the need to separately call out extreme impact events, regardless of probability, because they pose an existential threat to a firm. With this goal in mind, risk managers can now focus on a) understanding why the term is so pervasive, and b) find a way to communicate the same intent without logical fallacies.

Black Swan Definition and Misuse

The most common definition of a Black Swan is: an event in which the probability of occurrence is low, but the impact is high. A contemporary example is a 1,000 year flood or 9/11. In these, and similar events, the impact is so extreme, risk managers have felt the need to classify these events separately; call them out with an asterisk (*) to tell decision makers not to be lulled into a false sense of security because the annualized risk is low. This is where the office-talk term “Black Swan” was born. It is an attempt to assign a special classification to these types of tail risks.

This isn’t an entirely accurate portrayal of Black Swan events, however, according to both Taleb and Thomas.

According to Taleb, a Black Swan event has these three attributes:

First, it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme ‘impact’. Third, in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.

After examining the complex scenarios in which these types of conditions exist, it’s clear that the concept Taleb is trying to describe is well beyond something that would be found in a risk register, and is, in fact a critique of modern risk management techniques. It is an oxymoron to include this term in a risk program or even use it to describe risks.

Despite these points, the term has entered the everyday lexicon and, along with Kleenex and Cyber, it’s here to stay. It’s become a generally accepted word to describe low probability, high impact events. Is there something better?

Factor Analysis of Information Risk (FAIR), the risk analysis model developed by Jack Jones, doesn’t deal directly on a philosophical level with Black Swan events, but it does provide risk managers with a few extra tools to describe circumstances around low probability, high impact events. These are called risk conditions.

“Risk Conditions”: The FAIR Way to Treat a Black Swan

Risk is what matters. When scenarios are presented to management, it doesn’t add much to the story if one risk has a higher probability or a lower probability than other types of risks, or if one impact is higher than the other. FAIR provides the taxonomy to assess, analyze and report risks based on a number of factors (e.g. threat capability, control strength, frequency of a loss event). Most risk managers have just minutes with senior executives and will avoid an in-depth discussion of individual factors and will, instead, focus on risk. Why do some risk managers focus on Black Swan events then?

Risk managers use the term because they need to communicate something extra — they need an extra tool to draw attention to those few extreme tail risks that could outright end a company. There may be something that can be done to reduce the impact (e.g. diversification of company resources in preparation for an earthquake) or perhaps nothing can be done (e.g. market or economic conditions that cause company or sector failure). Nevertheless, risk managers would be remiss to not point this out.

Risk conditions go beyond simply calling out low probability, high impact events. They specifically deal with low probability, high impact events that do not have any or have weak mitigating controls. Categorizing it this way makes sense when communicating risk. Extreme tail risks with no mitigating controls may get lost in annualized risk aggregation.

FAIR describes two risk conditions: unstable risk and fragile risk.

Unstable risk conditions describe a situation in which the probability of a loss event is low and there are no mitigating controls in place. It’s up to each organization to define what “low probability” means, but most firms describe events that happen every 100 years or less as low probability. An example of an unstable risk condition would be a DBA having unfettered, unmonitored access to personally identifiable information or a stack of confidential documents sitting in an unlocked room. The annualized loss exposure would probably be relatively low, but controls aren’t in place to lower the loss event frequency.

A fragile risk condition is very similar to an unstable risk condition; however, the distinction is that there is one control in place to reduce the threat event frequency, but no backup control(s). An example of this would be a critical SQL database is being backed up nightly, but there’re no other controls to protect against an availability event (e.g. disk mirroring, database mirroring).

Conclusion

Don’t fight the Black Swan battle — leave that to philosophers and risk thinkers — but try to understand why someone is calling something a Black Swan. Provide the tools, such as those provided by the FAIR taxonomy, to help business leaders and your colleagues conceptualize actual risk. Risk conditions describe these types of events and the unique risks they pose with greater clarity and without outdated, often misused metaphors.

Originally published at www.fairinstitute.org.

Read More
Quantitative Risk Tony MartinVegue Quantitative Risk Tony MartinVegue

Prioritizing Patches: A Risk-Based Approach

It’s been a tough few weeks for those of us that are responsible for patching vulnerabilities in the companies we work at. Not only do we have the usual operating system and application patches, we also have patches for VENOM and Logjam to contend with. The two aforementioned vulnerabilities are pretty serious and deserve extra attention. But, where to start and what to do first? Whether you have hundreds or thousands or hundreds of thousands of systems to patch, you have to start somewhere. Do you test and deploy patches for high severity vulnerabilities first, or do you continue to deploy routine patches, prioritizing systems critical to the functioning of your business?

5bc9f-19ernxepah3r1ezffz33iza.jpeg

It’s been a tough few weeks for those of us that are responsible for patching vulnerabilities in the companies we work at. Not only do we have the usual operating system and application patches, we also have patches for VENOM and Logjam to contend with. The two aforementioned vulnerabilities are pretty serious and deserve extra attention. But, where to start and what to do first? Whether you have hundreds or thousands or hundreds of thousands of systems to patch, you have to start somewhere. Do you test and deploy patches for high severity vulnerabilities first, or do you continue to deploy routine patches, prioritizing systems critical to the functioning of your business?

It depends. You have to take a risk-based approach to patching, fully considering several factors including where the system is on the network, the type of data it has, what it’s function is and whether or not the patch in question poses a threat.

There’s an old adage in risk management (and business in general): “When everything’s a priority, nothing a priority.” How true it is. For example, if you scan your entire network for the Heartbleed vulnerability, the tool will return a list of all systems that the vulnerability has been found on. Depending on the size of your network, this could seem like an insurmountable task — everything is high risk.

A good habit to get into for all security professionals is to take a risk-based approach when you need to make a decision about resources. (“Resources” in this context can be money, personnel, time, re-tasking an application, etc.) Ask yourself the following questions:

  • What is the asset I’m protecting? What is the value?

  • Are there are compliance, regulatory or legal requirements around this system. For example, does it store PHI (Personal Health Information), is in-scope for Sarbanes-Oxley or does it fall under PCI?

  • What are the vulnerabilities on this system?

  • What is the threat? Remember, you can have vulnerability without a threat — think of a house that does not have a tornado shelter. The house is in California.

  • What is the impact to the company if a threat exploited the vulnerability and acted against the asset? Impact can take many forms, including loss productivity, lost sales, a data breach, system downtime, fines, judgments and reputational harm.

A Tale of Two Systems

Take at look at the diagram below. It illustrates two systems with the same web vulnerability, but different use cases and impact. A simple vulnerability scan would flag both systems as having high-severity vulnerabilities, but a risk-based approach to vulnerability mitigation reveals much different priorities.

6dfb2-1lktxm3fcnx5liy8ggwmvyg.png

This is not to say that Server #2 could not be exploited. It very much could be, by an insider, a vendor or from an outside attacker and the issue needs to be remediated. However, it is much more probable that System #1 will be compromised in a shorter time-frame. Server #2 would also be on the list to get patched, but considering that attackers external to the organization have to try a little harder to exploit this type of vulnerability and the server is not critical to the functioning to the business, the mitigation priority is Server #1.

Your Secret Weapon

Most medium-to-large companies have a separate department dedicated to Business Continuity. Sometimes they are in IT as part of Disaster Recovery, and sometimes they are in a completely separate department, focusing on enterprise resiliency. Either way, one of the core functions of these departments is to perform a business impact analysis on critical business functions. For example, the core business functions of the Accounting department are analyzed. Continuity requirements are identified along with impact to the company. Many factors are considered, including financial, revenue stream, employee and legal/regulatory impact.

This is an excellent place to start if you need data on categorizing and prioritizing your systems. In some cases, the business impact analysis is mapped back to actual server names or application platforms, but even if it’s not, you can start using this data to improve your vulnerability management program.

It’s difficult to decide where to deploy scarce resources. The steps outlined above truly are the tip of the iceberg but are nonetheless a great first step in helping to prioritize when and where to start implementing mitigating controls. The most successful Information Security departments are those that able to think in risk-based terms naturally when evaluating control implementation. With practice, it becomes second nature.

About the Author:Tony Martin-Vegue works for a large global retailer leading the firm’s cyber-crime program. His enterprise risk and security analyses are informed by his 20 years of technical expertise in areas such as network operations, cryptography and system administration. Tony holds a Bachelor of Science in Business Economics from the University of San Francisco and holds many certifications including CISSP, CISM and CEH.

Originally published at www.tripwire.com on May 31, 2015.

Read More