Expert elicitation is simple to define, but difficult to effectively use given its complexities. Most of us already use some form of expert elicitation while performing a risk analysis whenever we ask someone their opinion on a particular data point. The importance of using a structured methodology for collecting and aggregating expert opinion is understated in risk analysis, especially in cyber risk where this topic in common frameworks is barely touched upon, if at all.
There may be instances in a quantitative risk analysis in which expert opinion is needed. For example, historical data on generalized ransomware payout rates is available, but an adjustment is needed for a particular sector. Another common application is eliciting experts when data is sparse, hard to come by, expensive, not available, or the analysis does not need precision. Supplementing data with the opinion of experts is an effective, and common, method. This technique is seen across many fields: engineering, medicine, oil and gas exploration, war planning - essentially, anywhere you have any degree of uncertainty in decision making, experts are utilized to generate, adjust or supplement data .
If asking one expert to make a forecast is good, asking many is better. This is achieved by gathering as many opinions as possible to include a diversity of opinion in the analysis. Once all the data is gathered, however, how does the analyst combine all the opinions to create one single input for use in the analysis? It turns out that there is not one single way to do this, and one method is not necessarily better than others. The problem of opinion aggregation has vexed scientists and others that rely on expert judgment, but after decades of research, the field is narrowed to several techniques with clear benefits and drawbacks to each.
The Two Methods: Behavioral and Mathematical
The two primary methods of combining the opinion of experts fall into two categories: behavioral and mathematical. Behavioral methods involve the facilitator working through the question with a group of experts until a consensus is reached. Methods vary from anonymous surveys, informal polling, group discussion and facilitated negotiation. The second major category, mathematical aggregation, involves asking experts an estimation of a value and using an equation to aggregate all opinions together.
Each category has its pros and cons, and the one the risk analyst chooses may depend on the analysis complexity, available resources, precision required in the analysis and whether or not the drawbacks of the method ultimately chosen are palatable to both the analyst and the decision maker.
Combining expert estimates using behavioral methods span a wide range of techniques, but all have one thing in common: a facilitator interviews experts in a group setting and asks for estimations, justification, and reasoning. At the end of the session, the group (hopefully) reaches a consensus. The facilitator now has a single distribution that represents the opinion of a majority of the participants that can be used in a risk analysis.
An example of this would be asking experts for a forecast of future lost or stolen laptops for use in a risk analysis examining stronger endpoint controls. The facilitator gathers people from IT and Information Security departments, presents historical data (internal and external) about past incidents and asks for a forecast of future incidents.
Most companies already employ some kind of aggregation of expert opinion in a group setting: think of the last time you were in a meeting and were asked to reach a consensus about a decision. If you have ever performed that task, you are familiar with this type of elicitation.
The most common method is unstructured: gather people in a room, present research, and have a discussion. More structured frameworks exist that aim to reduce some of the cons listed below. The two most commonly used methods are the IDEA Protocol (Investigate, Discuss, Estimate, Aggregate) and some forms of the Delphi Method.
There are several pros and cons associated with the behavioral method.
Agreement on assumptions. The facilitator can quickly get the group using the same assumptions, definitions, and interpret the data in generally the same way. If one member of the group misunderstands a concept or misinterprets data, others in the group can help.
Corrects for some bias. If the discussion is structured (e.g., using the IDEA protocol), it allows the interviewer to identify some cognitive biases, such as the over/underconfidence effect, the availability heuristic and anchoring. A good facilitator uses the group discussion to minimize the effects of each in the final estimate.
Mathless. Group discussion and consensus building do not require an understanding of statistics or complex equations, which can be a factor for some companies. Some risk analysts may wish to avoid complex math equations if they, or their management, do not understand them.
Diversity of opinion: The group, and the facilitator, hears the argument of the minority opinion. Science is not majority rule. Those with the minority opinion can still be right.
Consensus: After the exercise, the group has an estimate that the majority agrees with.
Prone to Bias: While this method controls for some bias, it introduces others. Unstructured elicitation sees bias creep in, such as groupthink, the bandwagon effect, and the halo effect. Participants will subconsciously, or even purposely, adopt the same opinions as their leader or manager. If not recognized by the facilitator, majority rule can quickly take over, drowning out minority opinion. Structured elicitation, such as the IDEA protocol which has individual polling away from the group as a component, can reduce these biases.
Requires participant time: This method may take up more participant time than math-based methods, which do not involve group discussion and consensus building.
Small groups: It may not be possible for a facilitator to handle large groups, such as 20 or more, and still expect to have a productive discussion and reach a consensus in a reasonable amount of time.
The other method of combining expert judgment is math based. The methods all include some form of averaging, whether it's averaging all values in each quantile or creating a distribution from distributions. The most popular method of aggregating many distributions is the classical model developed by Roger Cooke. The classical model has extensive usage in many risk and uncertainty analysis disciplines, including health, public policy, bioscience, and climate change.
Simple averaging (e.g. mean, mode, median) in which all participants are weighted equally can be done in a few minutes in Excel. Other methods, such as the classical model, combines probabilistic opinions using a weighted linear average of individual distributions. The benefit to using the linear opinion pool method is that the facilitator can assign weights to different opinions. For example, one can weigh calibrated experts higher than non-calibrated ones. There are many tools that support this function, including two R packages: SHELF and expert.
As with the behavioral category, there are numerous pros and cons to using mathematical methods. The risk analyst must weigh each one to find the best that aids in the decision and risk analysis under consideration.
May be faster than consensus: The facilitator may find that math-based methods are quicker than group deliberation and discussion, which lasts until a consensus is reached or participants give up.
Large group: One can handle very large groups of experts. If the facilitator uses an online application to gather and aggregate opinion automatically, the number of participants is virtually limitless.
Math-based: Some find this a con, others find this a pro. While the data is generated from personal opinion, the results are math-based. For some audiences, this can be easier to defend.
Reduces some cognitive biases: Experts research the data and give their opinion separately from other experts and can be as anonymous as the facilitator wishes. Groupthink, majority rule, and other associated biases are significantly reduced. Research by Philip Tetlock in his 2016 book Superforecasters shows that if one has a large enough group, biases tend to cancel each other out – even if the participants are uncalibrated.
Different opinions may not be heard: Participants do not voice a differing opinion, offer different interpretations of data or present knowledge that the other experts may not have. Some of your “experts” may not be experts at all, and you would never know. The minority outlier opinion that may be right gets averaged in, and with a big enough group, gets lost.
Introduces other cognitive biases: If you have an incredibly overconfident group, forecasts that are right less often than the group expects are common. Some participants might let anchoring, the availability heuristic or gambler's fallacy influence their forecasts. Aggregation rolls these biases into one incorrect number. (Again, this may be controlled for by increasing the pool size.)
Complex math: Some of the more complex methods may be out of reach for some risk departments.
No consensus: It’s possible that the result is a forecast that no one agrees with. For example, if you ask a group of experts to forecast the number of laptops the company will lose next year, and experts return the following most likely values of: 22, 30, 52, 19 and 32. The median of this group of estimations is 30 – a number that more than half of the participants disagree with.
Which do I use?
As mentioned at the beginning of this post, there is not one method that all experts agree upon. You don’t have to choose just one – you may decide to use informal verbal elicitation for a low-precision analysis, and you have access to a handful of experts. The next week, you may choose to use a math-based method for an analysis in which a multi-million dollar decision is at stake, and you have access to all employees in several departments.
Deciding which one to use has many factors that vary from the facilitator’s comfort level with the techniques, the number and expertise of the experts, the geographic locations of the participants (e.g., are they spread out across the globe, or all work in the same building) and many others.
Here are a few guidelines to help you choose:
Behavioral methods work best when:
You have a small group, and it’s not feasible to gather more participants
You do not want to lose outlier numbers in averaging
Reaching a consensus is a goal in your risk analysis (it may not always be)
The question itself is ambiguous and/or the data can be interpreted differently by different people
You don’t understand the equations behind the math-based techniques and may have a hard time defending the analysis
Math-based methods work best when:
You have a large group of experts
You need to go fast
You don’t have outlier opinion, or you have accounted for these in a different way
You just need the opinion of experts – you do not need to reach a consensus
The question is focused, unambiguous and the data doesn’t leave much room for interpretation
We all perform some kind of expert judgement elicitation, even if its informal and unstructured. Several methods of aggregation exist and are in wide use across many disciplines where uncertainty is high or data is hard to obtain. However, aggregation should never be the end of your risk analysis. Use the analysis results to guide future data collection and future decisions, such as levels of precision and frequency of re-analysis.
Stay tuned for more posts on this subject, including a breakdown of techniques with examples.
The Wisdom of Crowds by James Surowiecki
Superforecasting: The Art and Science of Prediction by Philip Tetlock
Thinking Fast and Slow by Daniel Kahneman
Behavioral Aggregation Methods
Is It Better to Average Probabilities or Quantiles? By Kenneth C. Lichtendahl, Jr., Yael Grushka-Cockayne, Robert L. Winkler
Expert Elicitation: Using the Classical Model to Validate Experts’ Judgments by Abigail R Colson, Roger M Cooke