A Meta-Meta-Analysis: Empirical Review of Statistical Power, Type I Error Rates, Effect Sizes, and Model Selection of Meta-Analyses Published in Psychology

Document Type


Publication Date


Digital Object Identifier (DOI)



This article uses meta-analyses published in Psychological Bulletin from 1995 to 2005 to describe meta-analyses in psychology, including examination of statistical power, Type I errors resulting from multiple comparisons, and model choice. Retrospective power estimates indicated that univariate categorical and continuous moderators, individual moderators in multivariate analyses, and tests of residual variability within individual levels of categorical moderators had the lowest and most concerning levels of power. Using methods of calculating power prospectively for significance tests in meta-analysis, we illustrate how power varies as a function of the number of effect sizes, the average sample size per effect size, effect size magnitude, and level of heterogeneity of effect sizes. In most meta-analyses many significance tests were conducted, resulting in a sizable estimated probability of a Type I error, particularly for tests of means within levels of a moderator, univariate categorical moderators, and residual variability within individual levels of a moderator. Across all surveyed studies, the median effect size and the median difference between two levels of study level moderators were smaller than Cohen's (1988) conventions for a medium effect size for a correlation or difference between two correlations. The median Birge's (1932) ratio was larger than the convention of medium heterogeneity proposed by Hedges and Pigott (2001) Hedges, L. V. and Pigott, T. D. 2001. The power of statistical tests in meta-analysis.. Psychological Methods, 6: 203–217. [Crossref], [PubMed], [Web of Science ®], [Google Scholar] and indicates that the typical meta-analysis shows variability in underlying effects well beyond that expected by sampling error alone. Fixed-effects models were used with greater frequency than random-effects models; however, random-effects models were used with increased frequency over time. Results related to model selection of this study are carefully compared with those from Schmidt, Oh, and Hayes (2009), who independently designed and produced a study similar to the one reported here. Recommendations for conducting future meta-analyses in light of the findings are provided.

Was this content written or created while at USF?


Citation / Publisher Attribution

Multivariate Behavioral Research, v. 45, issue 2, p. 239-270