Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Friends don't let friends copy-paste

Conceptual introduction

Frederik Aust & Marius Barth

21.04.2023

Agenda

  • Computational reproducibility
    • Definition
    • The Problem of non-reproducibility
  • Dynamic documents

Computational reproducibility

Computational reproducibility

Computational reproducibility

Computational reproducibility

Computational reproducibility

NSF subcommittee on
replicability in science:

[Computational] Reproducibility is a minimum necessary condition for a finding to be believable and informative. (p. 4, Cacioppo, Kaplan, Krosnick, Olds, & Dean, 2015; also see Peng, 2011)

Computational reproducibility

Is it just me?

Computational reproducibility

Is it just me?

  1. Full reproduction
    • Access to data
    • Complete analysis plan
    • Lot's of (detective) work

Computational reproducibility

Is it just me?

  1. Full reproduction
    • Access to data
    • Complete analysis plan
    • Lot's of (detective) work
  1. Consistency of reported results
    • Reported statistics sufficient
    • In some cases, automatable

Computational reproducibility

Full reproduction

Journal Source Failure rate
Journal of Cognition Hardwicke et al. (2018) 37%
Quarterly Journal of Political Science Eubank (2016) 58%
Strategic Management Journal Bergh et al. (2017) 30%
Science Stodden, Seiler, & Ma (2018) 41%

Computational reproducibility

Full reproduction

Field Source Failure rate
Psychology Artner et al. (2020) 30%
Psychology (RR) Obels et al. (2020) 42%
Economics Vilhuber (2020) 39-51%
Organismal biology Andrew et al. (2015) 35%
Genetics Gilbert et al. (2012) 30%
Geosciences Konkol, Kray & Pfeiffer (2019) 56%
RCT (primary outcome) Naudet et al. (2018) 12%

Computational reproducibility

Consistency of reported results

Field Source Failure rate
Psychology Nuijten et al. (2020) 30%
Psychology Heathers & Brown (2017) 51%
Personality and social psychology Petrocelli et al. (2013) 31%

  • Test statistics and p values (Bakker & Wicherts, 2011)
  • Descriptive statistics of integer data (Brown & Heathers, 2017)
  • Equivalence of paths in mediation analysis (Petrocelli et al., 2013)

Computational reproducibility

(Nuijten et al., 2016)

Computational reproducibility

"Non-gross" inconsistency be consequential

there may have been an error in the statistics reported in the original article […] the standard deviations reported in a similar study [...] were approximately 6 times as large and the effect size was substantially smaller (p. 323, McCarthy et al., 2018) ]

"exposing participants to hostility-related stimuli caused them subsequently to interpret ambiguous behaviors as more hostile."

Computational reproducibility

Are we just sloppy?

Quarterly Journal of Political Science check data & code for each submission (24 papers)

Computational reproducibility

Are we just sloppy?

Quarterly Journal of Political Science check data & code for each submission (24 papers)

  • 58% produced discrepancies
  • 54% would not run without errors
  • 33% did not produce all results

(Eubank, 2016)

Computational reproducibility

Interim summary

Computational non-reproducibility

  • is wide-spread

Computational reproducibility

Interim summary

Computational non-reproducibility

  • is wide-spread
  • undermines credibility of the literature
  • wastes resources
  • is unethical

Computational reproducibility

Interim summary

Computational non-reproducibility

  • is wide-spread
  • undermines credibility of the literature
  • wastes resources
  • is unethical
  • is not caused by simple sloppiness

Computational reproducibility

Computational reproducibility

Common causes

  • Typos
  • Copy-paste errors
  • Incorrect rounding
  • "Outdated" results

(Artner et al., 2020; Eubank, 2016)

Computational reproducibility

Common causes

  • Typos
  • Copy-paste errors
  • Incorrect rounding
  • "Outdated" results

(Artner et al., 2020; Eubank, 2016)


Friends don't
let friends

Ctrl + C

Ctrl + V

Rounding: In reporting and calculations

Automation: Adopt seasoned tools from the computer sciences

Computational reproducibility

Common causes

  • File paths
  • Incomplete data
  • Incomplete scripts
  • Outdated libraries

(Eubank, 2016; Konkol, Kray & Pfeiffer, 2019)

Computational reproducibility

Common causes

  • File paths
  • Incomplete data
  • Incomplete scripts
  • Outdated libraries

(Eubank, 2016; Konkol, Kray & Pfeiffer, 2019)

Solutions

  • Automation
  • Code review
  • Archiving

Computational reproducibility

Common causes

  • File paths
  • Incomplete data
  • Incomplete scripts
  • Outdated libraries

(Eubank, 2016; Konkol, Kray & Pfeiffer, 2019)

Solutions

  • Automation
  • Code review
  • Archiving

Dynamic documents

Dynamic documents

Dynamic documents

A quick demonstration!

Dynamic documents

Dynamic documents

Dynamic documents

Dynamic documents

No guarantee that results are "correct"

Dynamic documents

No guarantee that results are "correct"

Data error prompts U-turn on study of sex differences in school

the 0/1 coding for "boy" and "girl" that we had on the paper questionnaires was opposite to the one in the SPSS labels

Dynamic documents

Various packages for different languages and outputs

Dynamic documents

rmarkdown / knitr / Quarto

  • Widely used and actively developed
  • Supports many output formats
    (e.g., HTML, PDF, DOCX, ODT)
  • Customizable and extendable

Let's get some exercise!

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow