evaluation

evaluation

Introducing benchmarks for the evaluation of psychological models

Quantitative research in psychology and neighboring field emphasizes explanation and in-sample effect sizes over demonstrating models’ ability to predict on unseen data (generalization).
In a methods paper that interleaves theoretical arguments with empirical demonstrations (code available in this repo), we show how psychology would benefit from adopting benchmarking as a consensus paradigm for model evaluation.
We discuss how psychology can learn from both the strengths and the known weaknesses (e.g., biases, overfitting) of benchmarking in ML, discuss first steps for introducing these new practices in the field, and outline their potential to increase the practical utility of the outputs of psychological research.
This article has been published in Advances in Methods and Practices in Psychological Sciences, and it available at: https://journals.sagepub.com/doi/full/10.1177/25152459211026864

research methods evaluation machine learning