Tony MartinVegue 2/7/21 Tony MartinVegue 2/7/21

The 2021 Security Outcomes report and better research methods

Most security surveys are junk science — biased, unrepresentative, and built to sell, not inform. Cisco’s 2021 report, in partnership with Cyentia, finally gets it right, showing the industry what statistically sound research actually looks like.

Something extraordinary happened recently in the Information Security research report area. Why I think it’s so extraordinary might have passed you by, unless you geek out on statistical methods in opinion polling as I do. The report is Cisco’s 2021 Security Outcomes report, produced in collaboration with the Cyentia Institute which is the only report in recent memory that uses sound, statistical methods in conducting survey-based opinion research. What is that and why is it so important? Glad you asked! <soapbox on>

The Current Problem

Numerous information security reports and research are survey-based, meaning they ask a group of people their opinion on something, aggregate the results, draw conclusions, then present the results. This is a very common -- and often illuminating -- way to perform research. It reveals the preferences and opinions of a group of people, which can further be used as a valuable input in an analysis. This method is called opinion polling, and we are all very familiar with this kind of research, mostly from political opinion polls that ask a group of people how they’re voting.

The single most important thing to keep in mind when evaluating the veracity of an opinion poll is knowing how respondents are selected.

If the people that comprise the group are selected at random, one can extrapolate the results to the general population.
If the people that comprise the group are not selected at random, the survey results only apply to the group itself.

Here’s the problem: most, if not all, survey-based information security reports make no effort to randomize the sample. As a consequence, the results are skewed. If a headline reads “40% of Ransomware Victims Pay Attackers” and the underlying research is non-random opinion polling, like a Twitter poll or emailing a customer list, the headline is misleading. The results only apply to the people that filled out the survey, not the general population. If you’re not following, let me use a more tangible example.

Let’s suppose you are a campaign worker for one of the candidates in the 2020 US Presidential campaign. You want to know who is ahead in California - Trump or Biden. The way you figure this out is to ask a group of people how they plan on voting on Election Day. I want to extrapolate the results to the whole of California - not just one area or demographic. It’s not feasible to ask every single Californian voter how they’re voting; just ask a representative sample. Which one of these polling methods do you think will yield the most accurate results?

Stand in front of a grocery store in Anaheim (the most conservative city in California) and ask 2000 people how they are voting
Stand in front of a grocery store in San Francisco (the most liberal city in California) and ask 2000 people how they are voting
Ask 2000 people on my Twitter account and offer them a $5 Amazon gift card to answer a poll on how they’re voting. It’s the honor system that they’re actually from California.

All three options will result in significant error and bias in the report, especially if the results are applied to how all Californians will vote. It’s called selection bias and it occurs when the group sampled is systematically different than the wider population being studied. Regardless if you work for the Biden or Trump campaign, you can’t use these survey results. Option 1 will skew toward Trump and option 2 will skew toward Biden - giving both teams an inaccurate view of how all Californians will vote. Option 3 would yield odd results, with people from all over the world completing the survey just to get the gift card.

Every survey-based security research report I’ve seen is conducted using non-randomized samples, like the examples above, and are subject to selection bias. If you’re a risk analyst like me, you can’t use reports like this - the results are just too questionable. I won’t use it. Junk research is the death of the defensibility of a risk assessment.

What Cyentia did

Let’s add a 4th option to the list above:

4. Obtain a list of all likely California voters, randomly sample enough people so that the likely response rate will be ~2000, and ask them how they plan on voting.

This method is much better; this gets us closer to a more accurate (or less wrong, following George Box’s aphorism) representation of how a larger group of people will vote.

Here’s the extraordinary thing: Cisco and Cyentia actually did it! They produced a research report based on opinion polling that adheres to sampling and statistical methods. They didn’t just ask 500 randos on Twitter with the lure of gift cards to answer a survey, like everyone else does. They went through the hard work to get a list, randomly sample it, debias the questions, and present the results in a transparent, usable way. This is the first time I’ve ever seen this in our industry and I truly hope it’s not the last.

A sea change?

Nearly all survey-based Information Security research reports cut corners, use poor methods, and typically use the results to scare and sell, rather than inform. The result is, at best, junk and at worst, actively harmful to your company if used to make decisions. I know that doing the right thing is hard and expensive, but it’s worth doing. Wade Baker of Cyentia wrote a blog post detailing the painstaking complexity of the sampling methodology. As a consumer of this research, I want Cyentia and Cisco to know that their hard work isn’t for nothing. The hard work means I and many others can use the results in a risk analysis.

I truly hope this represents a sea change in how security research is conducted. Thanks to everyone involved for going the extra mile - the results are remarkable.

The Problem with Security Vendor Reports

Most vendor security reports are just glossy marketing in disguise, riddled with bad stats and survey bias. This post breaks down how to spot junk research before it ends up in your board slides — and how to demand better.

The information security vendor space is flooded with research: annual reports, white papers, marketing publications — the list goes on and on. This research is subsequently handed to marketing folks (and engineers who are really marketers) where they fan out to security conferences across the world, standing in booths quoting statistics and attending pay-to-play speaking slots, convincing executives to buy their security products.

There’s a truth, however, that the security vendors know but most security practitioners and decision makers aren’t quite wise to yet. Much of the research vendors present in reports and marking brochures isn’t rooted in any defensible, scientific method. It’s an intentional appeal to fear, designed to create enough self-doubt to make you buy their solution.

This is how it’s being done:

Most vendor reports are based on surveys, also known as polls
Most of the surveys presented by security vendors ignore the science behind surveys, which is based on statistics and mathematics
Instead of using statistically significant survey methods, many reports use dubious approaches designed to lead the reader down a predetermined path

This isn’t exactly new. Advertisers have consumer manipulation down to an art form and have been doing it for decades. Security vendors, however, should be held to a higher standard due to fact that the whole field is based on trust and credibility. Many vendor reports are presented as security research and not advertisements.

What’s a survey?

A survey is a poll. Pollsters ask a small group of people a question, such as “In the last year, how many of your security incidents have been caused by insiders?” The results are extrapolated to apply it to a general population. For example, IBM conducted a survey that found that 59% of CISO’s experienced cyber incidents in which the attackers could defeat their defenses. The company that conducted the survey didn’t poll all CISO’s — they polled a sample of CISO’s and extrapolated a generality about the entire population of CISO’s.

This type of sampling and extrapolation is completely acceptable to do, if the survey adheres to established methodologies in survey science. Doing so makes the survey statistically significant; not doing it puts the validity of the results in question.

All surveys have some sort of error and bias. However, a good survey will attempt to control for this by doing the following:

Use established survey science methods to reduce the errors and bias
Disclose the errors and bias to the readers
Disclose the methodology used to conduce the survey
A good survey will also publish the raw data for peer review

Why you should care about statistically sound surveys

Surveys are everywhere in security. They are found in cute infographics, annual reports, journal articles and academic papers. Security professionals take these reports and read them, learn from them, quote them in steering committee meetings or to senior executives when they ask questions. Managers often ask security analysts to quantify risk with data — the easiest way is to find a related survey. We rely on the data to enable our firms to make risk-aware business decisions.

When you tell your Board of Directors that 43% of all data breaches are caused by internal actors, you’d better be right. The data you are using must be statistically significant and rooted in fact. If you are quoting vendor FUD or some marking brochure that’s disconnected from reality, your credibility is at stake. We are trusted advisors and everything we say must be defensible.

What makes a good survey

Everyone has seen a survey. Election and public opinion polls seem simple on the surface, but it’s very hard to do correctly. The science behind surveys are rooted in math and statistics; when the survey is, it’s statistically significant.

There are four main components of a statistically significant survey:

Population

This is a critical first step. What is the group that is being studied? How big is the group? An example would be “CISO’s” or “Information Security decision makers.”

Sample size

The size of the group you are surveying. It’s usually not possible to study an entire population, so a sample is chosen. A good survey taker will do all they can to ensure the sample size is as representative as the general population as possible. More importantly, the sample size needs to be randomly selected.

Confidence interval

Also known as the margin of error; (e.g. +/-); larger the sample size, the lower the margin of error.

Unbiased Questions

The questions themselves are crafted by a neutral professional trained in survey science. Otherwise, it is very easy to craft biased questions that lead the responder to answer in a certain way.

What makes a bad survey?

A survey will lose credibility as it uses less and less of the above components. There are many ways a survey could be bad, but here are the biggest red flags:

No disclosure of polling methodology
No disclosure of the company that conducted the poll
The polling methodology is disclosed, but no effort was made to make it random or representative of the population (online polls have this problem)
Survey takers are compensated (people will say anything for money)
Margin of error not stated

Be Skeptical

Be skeptical of vendor claims. Check for yourself and read the fine print. When you stroll the vendor halls at RSA or Blackhat and a vendor makes some outrageous claim about an imminent threat, dig deeper. Ask hard questions. We can slowly turn the ship away from FUD and closer to fact and evidence-based research.

And if you’re a vendor — think about using reputable research firms to perform your surveys.