It’s no great secret that a great deal of research is annoyingly impeded from providing more insight and stronger inferences by the lack of data; i.e., low statistical power. It is also no great secret that that a great many things happen for reasons we never suspected, because we could never “drill down” that far, and large corpi of data can sometimes reveal those informative patterns.
However, facilitating and furnishing insight are two different things. It is not the case that larger bodies of data will necessarily provide insight and stronger inferences by itself, or that the patterns that spuriously emerge from great masses of data are necessarily meaningful. With great statistical power comes great statistical responsibility: it is the data-user that must possess some glimmer of anticipation in order to see the patterns, and also the data-user that must distinguish between “statistical significance” and meaningfulness/relevance.
That’s one of the traps of massive data-sets: even things that predict .01% of the variance can end up being hugely “significant”. Of course, so many of us who may be social scientists have our statistical training predicated on datasets (whether supplied or our own) that are several orders of magnitude smaller than what we see with big data, and it is difficult to suppress our tendency to be distracted by tiny alphas, even if they are just an epiphenomenon of large datasets. IN that respect, I’m a fan of “enough” data, rather than “big” data.
But in any event, very good article with some great points. Thanks for the heads-up, David.