Note: I wrote this ages ago back in 2013, and it lives on the now defunct open science collaboration blog. Storing it here on my own website for archiving purposes. Original link: http://osc.centerforopenscience.org/2013/11/03/Increasing-statistical-power/
What is statistical power?
As scientists, we strive to avoid errors. Type I errors are false positives: Finding a relationship when one probably doesn’t exist. Lots has been said about these kinds of errors, and the field of psychology is sometimes accused of excessive Type I error rates through publication biases, p-hacking, and failing to account for multiple comparisons (Open Science Collaboration, in press). Type II errors are false negatives: Failing to find a relationship when one probably does exist. Type II errors are related to statistical power. Statistical Power is the probability that the test will reject the null hypothesis when the null hypothesis is false. Many authors suggest a statistical power rate of at least .80. This corresponds to an 80% probability of not committing a Type II error. Accuracy in parameter estimation (APIE) is closely related to statistical power, and it refers to the width of the confidence interval for the effect size (Maxwell et al., 2008). The smaller this width, the more precise your results are. This means that low statistical power not only increases Type II errors, but also Type I errors because underpowered studies have wide confidence intervals. Simply put, underpowered studies are imprecise and are unlikely to replicate (Button et al., 2013).
Studies in psychology are grossly underpowered
Psychological research has been grossly underpowered for a long time. Fifty years ago, Cohen (1962) estimated the statistical power to detect a medium effect size in abnormal psychology was about .48. The situation has improved slightly, but it’s still a serious problem today. For instance, one review suggested only 52% of articles in the applied psychology literature achieved .80 power for a medium effect size (Mone et al., 1996). This is in part because psychologists are studying small effects. One massive review of 322 meta-analyses including 8 million participants suggested that the average effect size in social psychology is relatively small (r = .21). To put this into perspective, you’d need 175 participants to have .80 power for a simple correlation between two variables at this effect size. This gets even worse when we’re studying interaction effects. One review suggests that the average effect size for interaction effects is even smaller (f2 = .009), which means that sample sizes of around 875 people would be needed to achieve .80 power.
What can we do to increase power?
Traditional recommendations for increasing statistical power suggest either (a) Increasing sample size, (b) maximizing effect size or (c) using a more liberal p-value criteria. However, increasing the effect size has no impact on the width of the confidence interval (Maxwell et al., 2008), and using a more liberal p-value comes at the expense of increased Type I error. Thus, most people assume that increasing the sample size is the only consistent way to increase statistical power. This isn’t always feasible due to funding limitations, or because researchers are studying rare populations (e.g., people with autism spectrum disorder). Fortunately, there are other solutions. Below, I list three ways you can increase statistical power without increasing sample size. You might also check out Hansen and Collins (1994) for a lengthier discussion.
Recommendation 1: Decrease the mean square error
Decreasing the mean square error will have the same impact as increasing sample size (if you want to see the math, check out McClelland, 2000). Okay. You’ve probably heard the term “mean square error” before, but the definition might be kind of fuzzy. Basically, your model makes a prediction for what the outcome variable (Y) should be, given certain values of the predictor (X). Naturally, it’s not a perfect prediction because you have measurement error, and because there are other important variables you probably didn’t measure. The mean square error is the difference between what your model predicts, and what the true values of the data actually are. So, anything that improves the quality of your measurement or accounts for potential confounding variables will reduce the mean square error, and thus improve statistical power. Let’s make this concrete. Here are three specific techniques you can use:
- Reduce measurement error by using more reliable measures (i.e., better internal consistency, test-retest reliability, inter-rater reliability, etc.). You’ve probably read that .70 is the “rule-of-thumb” for acceptable reliability. Okay, sure. That’s publishable. But consider this: Let’s say you want to test a correlation coefficient. Assuming both measures have a reliability of .70, your observed correlation will be about 1.43 times smaller than the true population parameter (I got this using Spearman’s correlation attenuation formula). Because you have a smaller observed effect size, you end up with less statistical power. Why do this to yourself? Reduce measurement error.
- Control for confounding variables. With correlational research, this means including control variables that predict the outcome variable, but are relatively uncorrelated with other predictor variables. In experimental designs, this means taking great care to control for as many possible confounds as possible. In both cases, this reduces the mean square error and improves the overall predictive power of the model – and thus, improves statistical power.
- Use repeated-measures designs. Repeated measures designs reduce the mean square error by partitioning out the variance due to subjects. Depending on the kind of analysis you do, it can also increase the degrees of freedom for the analysis substantially (e.g., some multi-level models). I’m a big fan of repeated measures designs, because they allow researchers to collect a lot of data from fewer participants.
Recommendation 2: Increase the variance of your predictor variable
Another less-known way to increase statistical power is to increase the variance of your predictor variables (X).
- In correlational research, use more comprehensive continuous measures. That is, there should be a large possible range of values endorsed by participants. However, the measure should also capture many different aspects of the construct of interest; artificially increasing the range of X by adding redundant items (i.e., simply re-phrasing existing items to ask the same question) will actually hurt the validity of the analysis. Also, avoid dichotomizing your measures (e.g., median splits), because this reduces the variance and typically reduces power.
- In experimental research, unequally allocating participants to each condition can improve statistical power. For example, say you were designing an experiment with 3 conditions (let’s say low, medium, or high self-esteem). Most of us would equally assign participants to all three groups, right? Well, as it turns out, that reduces statistical power. The optimal design for a linear relationship would be 50% low, 50% high, and omit the medium condition. The optimal design for a quadratic relationship would be 25% low, 50% medium, and 25% high. The proportions vary widely depending on the design and the kind of relationship you expect, but I recommend you check out McClelland (1997) to get more information on efficient experimental designs. You might be surprised.
Recommendation 3: Make sure predictor variables are uncorrelated with each other
A final way to increase statistical power is to increase the proportion of the variance in X not shared with other variables in the model. When predictor variables are correlated with each other, this is known as colinearity. For example, depression and anxiety are positively correlated with each other; including both as simultaneous predictors (say, in multiple regression) means that statistical power will be reduced, especially if one of the two variables actually doesn’t predict the outcome variable. Lots of textbooks suggest that we should only be worried about this when colinearity is extremely high (e.g., correlations around > .70). However, studies have shown that even modest intercorrlations among predictor variables will reduce statistical power (Mason et al., 1991). Bottom line: If you can design a model where your predictor variables are relatively uncorrelated with each other, you can improve statistical power.
Conclusion
Increasing statistical power is one of the rare times where what is good for science, and what is good for your career actually coincides. It increases the accuracy and replicability of results, so it’s good for science. It also increases your likelihood of finding a statistically significant result (assuming the effect actually exists), making it more likely to get something published. You don’t need to torture your data with obsessive re-analysis until you get p < .05. Instead, put more thought into research design in order to maximize statistical power. Everyone wins, and you can use that time you used to spend sweating over p-values to do something more productive. Like volunteering with the Open Science Collaboration.
References
Button, K. S., Ioannidis, J. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376. doi: 10.1038/nrn3475
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. The Journal of Abnormal and Social Psychology, 65, 145-153. doi:10.1037/h0045186
Hansen, W. B., & Collins, L. M. (1994). Seven ways to increase power without increasing N. In L. M. Collins & L. A. Seitz (Eds.), Advances in data analysis for prevention intervention research (NIDA Research Monograph 142, NIH Publication No. 94-3599, pp. 184–195). Rockville, MD: National Institutes of Health.
Mason, C. H., & Perreault, W. D. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research, 28, 268-280. doi:10.2307/3172863
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537-563. doi:10.1146/annurev.psych.59.103006.093735
McClelland, G. H. (1997). Optimal design in psychological research. Psychological Methods, 2, 3-19. doi:10.1037/1082-989X.2.1.3
McClelland, G. H. (2000). Increasing statistical power without increasing sample size. American Psychologist, 55, 963-964. doi:10.1037/0003-066X.55.8.963
Mone, M. A., Mueller, G. C., & Mauland, W. (1996). The perceptions and usage of statistical power in applied psychology and management research. Personnel Psychology, 49, 103-120. doi:10.1111/j.1744-6570.1996.tb01793.x
Open Science Collaboration. (in press). The Reproducibility Project: A model of large-scale collaboration for empirical research on reproducibility. In V. Stodden, F. Leisch, & R. Peng (Eds.), Implementing Reproducible Computational Research (A Volume in The R Series). New York, NY: Taylor & Francis. doi:10.2139/ssrn.2195999