Using automation to combat the replication crisis: A case study from controlled-rearing studies of newborn chicks

The accuracy of science depends on the precision of its methods. When fields produce precise measurements, the scientific method can generate remarkable gains in knowledge. When fields produce noisy measurements, however, the scientific method is not guaranteed to work: in fact, noisy measurements are now regarded as a leading cause of the replication crisis in psychology. Scientists should therefore strive to improve the precision of their methods, especially in fields with noisy measurements. Here, we show that automation can reduce measurement error by ∼60% in one domain of developmental psychology: controlled-rearing studies of newborn chicks. Automated studies produce measurements that are 3–4 times more precise than non- automated studies and produce effect sizes that are 3–4 times larger than non-automated studies. Automation also eliminates experimenter bias and allows replications to be performed quickly and easily. We suggest that automation can be a powerful tool for improving measurement precision, producing high powered experiments, and combating the replication crisis.

Scatterplots and boxplots of the (A) measurement error and (B) effect sizes from samples of automated (blue points) and non-automated (red points) controlled-rearing studies. Each point represents the (A) standard deviation or (B) Cohen's d from a single condition (only statistically significant conditions are included). The boxplots show the range from 25th to 50th percentile and from 50th to 75th percentile. Points with an asterisk are beyond the range of the graph (greater than 50% standard deviation or greater than 2.5 Cohen's d). Across a wide range of studies, the effect sizes obtained with automated methods were much larger than the effect sizes obtained with non-automated methods. Automated methods also produced far more precise measurements than non-automated methods. (C) The number of subjects needed to achieve 80% power for a range of true population performance values for studies with low measurement error (standard deviation = 10%) and high measurement error (standard deviation = 33%). These standard deviations match those from our samples of automated and non-automated studies, respectively. Low mea- surement error massively reduces the number of subjects needed to achieve adequate experimental power. When measurement error is high, small decreases in true population performance require large increases in the number of subjects needed to achieve 80% power. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Automation can significantly (A) increase the effect size and (B) decrease the measurement error of an experiment, by allowing large amounts of data to be collected from each subject. For example, for Wood (2013) and Wood (2014), increasing the amount of data collected from each chick increased the observed effect size by a factor of 4 and reduced the measurement error by a factor of 2.6. Thus, collecting more data per subject significantly increases the precision of data, leading to high powered experiments.
Movie 1. A simulated view of controlled rearing chamber