Are Inferential Statistics Really What Psychology Needs?

In statistical analysis, we are generally publishing inferential statistics in psychology. That is, we often make estimates for population parameters and test null hypotheses. I have been thinking about the intersection of equity and statistics for a while now, ever since reading the history of statistics.

Tracing back to Galton, Fisher and Pearson it quickly becomes apparent that these statistical methods were designed specifically to identify group differences. That is, it begins with the assumption that different populations have different population distributions, and statistical methods were devised to set about proving those points without needing to measure data from an entire population. This can be generalized to correlations, where we are now assuming the relationship between X and Y is a single “true” value in the population that can be approximated within certain bounds of uncertainty. It was certainly revolutionary and changed the scope of what psychologists studied – we moved from intensive studies of individuals (e.g., Wundt, Ebbinghaus), to using statistical methods to estimate population parameters.

But it has always gnawed at the back of my mind that these might be the wrong questions, driven by the goal of ranking and stratifying the human population. Properly conducted, inferential statistics help us answer certain questions about populations, but my abiding interest in humans has always been about individual people, not population parameters.

I’ve begun to wonder if the field’s almost obsessive focus on population parameters – and weak tests of differences between them with null hypothesis significance testing – is fundamentally misguided in a lot of areas. There are obviously lots of cases where conventional statistics have generated important knowledge but given that much of my research interest over the years has been on human subjectivity (e.g., well-being, motivation, self-reports of personality, anxiety, depression) I’ve begun to wonder whether conventional statistical methods can really be of much use to truly understand these sorts of things. In no particular order, some problems:

1. Samples of humans willing to participate in research are virtually never random or representative, given the pragmatic constraints of actual data collection. Thus, we are estimating population parameters for specialized sub-populations that are quite often not defined or known. Outside of census research and very well-funded research programs, obtaining anything close to representativeness is typically unachievable.

2. Galton introduced the idea of the “normal” distribution, that most attributes in humans fall on it, with most being average and others falling around the tails. Somewhere along the line, the idea of normally distributed errors became generalized to the idea that TRAITS in humans fall along a normal distribution.  That is, there became a belief that population distributions of most measured human capacities, like personality and intelligence, should be normally distributed. For Galton, this interest is natural and making this argument helped with his aim to rank humans in mental ability.  However, many things we measure in humans are not distributed in this way (e.g., well-being, for instance, typically has negative skewness; alcohol consumption has positive skewness). In fact, if distributions of university grades are any indication, these distributions are also not “normally” distributed in most cases.

3. The focus on “population differences,” while not in of itself racist, it guides research along the lines as to emphasize and reify categories of people. The tools that we have promote categorization of humans into sub-populations, and comparison of those groups (e.g., t-tests, ANOVA). When we focus on Galton, Fisher, and Pearson, the bulk of our statistical methods were deliberately and explicitly designed and created to try and prove that populations differ in their mental abilities and were strongly motivated by eugenics in many cases. It constrains our inquiry to things like: (a) Does group A have a different population mean than Group B?  

4. Can there even truly be a population parameter for some latent constructs, which are themselves co-constructed through human interaction with psychological tests? Moreover, the nature of a lot of personality data is not truly numerical (it’s instead ordinal) so should we really be applying techniques for numerical data here?  

More questions than answers, I suppose. What I want to understand are within-person processes. That is, what is going on in the mind of a single person and can we actually predict their thoughts, feelings, and behavior? The intensive study of individual people has never been overly popular, but I wonder if it is something uniquely psychological that might need methods other than inferential stats.

I also think that people could do a lot richer description of humans using data visualization, descriptive statistics, and qualitative approaches. I’ve always found a detailed analysis of a human life in biography or the idea of a “quantified self” to be fascinating. There’s so much to learn about individual human variation; I think that psychology could really learn a lot by taking a step back and really thoroughly observing and describing the humans that we study.

I might explore some of these ideas more later. For now, I’m mostly just trying to give myself some time to really think and discuss ideas with other people during sabbatical, a precious commodity that gets lost in the regular bustle of day-to-day academics!