Go to main navigation Navigation menu Skip navigation Home page Search

New publication | Predicting the replicability of social and behavioural science claims in COVID-19 preprints

Can non-experts predict which scientific claims will hold up under scrutiny? A new study co-authored by Magnus Johannesson, Professor at the Department of Economics at SSE, suggests they can — almost as well as the experts. The findings could help guide smarter research investments during crises.

During the COVID-19 pandemic, scientific studies emerged at unprecedented speed. With peer review unable to keep up, preprints—early versions of research papers—played a critical role in informing urgent decisions by policymakers and the public alike. But how reliable were these early findings?

A new study published in Nature Human Behaviour tackled this question by testing whether groups of both experienced researchers and non-experts could predict which social and behavioural science claims about COVID-19 would replicate successfully. Replication, often seen as the gold standard for verifying scientific results, is time-consuming and expensive. The study offers an alternative: structured human judgment.

The researchers asked participants to evaluate the likelihood that 100 claims from COVID-19 preprints would replicate. Participants interacted in small groups and provided their best estimates before and after discussion. The research team then conducted 29 high-powered replications of these claims to compare predictions with outcomes.

Beginners held their own against experts

Interestingly, non-experts—or "beginners"—performed just as well as experienced participants in predicting replication outcomes. In fact, beginners slightly outperformed experts in terms of accuracy, correctly classifying 69% of claims compared to 61% for the experienced group. These differences were not statistically significant, but they challenge assumptions about the importance of formal expertise in such tasks.

Potential to inform real-world decision making

While both groups showed only modest predictive ability, their judgments were better than chance and could still be useful in prioritizing which findings to investigate further. This kind of forecasting could be especially valuable during health crises or other urgent policy challenges, where quick yet reliable assessments are crucial.

The study also revealed that non-experts were more open to adjusting their judgments after group discussions, suggesting they might benefit more from collaborative forecasting settings. Researchers say these findings raise new questions about how to best harness collective intelligence for science governance.

Abstract

Replications are important for assessing the reliability of published findings. However, they are costly, and it is infeasible to replicate everything. Accurate, fast, lower-cost alternatives such as eliciting predictions could accelerate assessment for rapid policy implementation in a crisis and help guide a more efficient allocation of scarce replication resources. We elicited judgements from participants on 100 claims from preprints about an emerging area of research (COVID-19 pandemic) using an interactive structured elicitation protocol, and we conducted 29 new high-powered replications. After interacting with their peers, participant groups with lower task expertise ('beginners') updated their estimates and confidence in their judgements significantly more than groups with greater task expertise ('experienced'). For experienced individuals, the average accuracy was 0.57 (95% CI: [0.53, 0.61]) after interaction, and they correctly classified 61% of claims; beginners' average accuracy was 0.58 (95% CI: [0.54, 0.62]), correctly classifying 69% of claims. The difference in accuracy between groups was not statistically significant and their judgements on the full set of claims were correlated (r(98) = 0.48, P < 0.001). These results suggest that both beginners and more-experienced participants using a structured process have some ability to make better-than-chance predictions about the reliability of 'fast science' under conditions of high uncertainty. However, given the importance of such assessments for making evidence-based critical decisions in a crisis, more research is required to understand who the right experts in forecasting replicability are and how their judgements ought to be elicited.