ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

How Many Replicators Does It Take to Achieve Reliability? Investigating Researcher Variability in a Crowdsourced Replication

Welfare State
Political Sociology
Methods
Quantitative
Mixed Methods
Public Opinion
Nate Breznau
Universität Bremen
Nate Breznau
Universität Bremen

Abstract

The paper reports findings from a crowdsourced replication. Eighty-five replicator teams attempted to verify numerical results reported in an original study by running the same models with the same data. The replication involved an experimental condition. A randomly assigned transparent group received the original study and code, and the other opaque group received only a methods section, a rough description of results and no code. The transparent group mostly verified the original study with the same sign and p-value threshold (95.7%), while the opaque group had less success (89.3%), but exact numerical reproductions to the second decimal place were far less common (76.9% and 48.1%). These results suggest that a single verification replication would only be reliable when the original study is highly transparent, otherwise it would take a minimum of three replicators to achieve this same level of confidence. Qualitative investigation of the replicators’ workflows reveals many causes of error including mistakes and variation in process. To provide a second ecological validity layer, the principal investigators corrected mistakes where possible. This improved the verification rate to 98.1% in the transparent group and 92.3% in the opaque group. If these results are ecologically valid, only highly transparent research would lead to reliable replications. This study sheds light on inter-researcher reliability and offers a baseline quantification. Although it may not be surprising that there is minor error among researchers, we show here this occurs where it is least expected and is large enough to undermine reliability between-researchers to some extent, possibly offering some quantifiable sources of the larger “reliability crisis” in social science. The more serious implications are that research involving more decision-making will inevitably cause more idiosyncratic variation in results across researchers. The obvious implication is greater transparency, something sociology only embraces at the margins. This study reminds us of the problematic knowledge being produced by a discipline with a lack-of-transparency as the status quo.