5 Myths About Big Data

Home Forums Technology 5 Myths About Big Data

This topic contains 2 replies, has 2 voices, and was last updated by  Mark Hammer 5 years, 3 months ago.

  • Author
  • #179746

    David B. Grinberg

    Great article from the Washington Post by:

    • Samuel Arbesman, an applied mathematician, network scientist, and senior scholar at the Ewing Marion Kauffman Foundation.

    Full article here

    1. “Big data” has a clear definition.

    “The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley…But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous.”

    2. Big data is new.

    “By many accounts, big data exploded onto the scene quite recently…But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations.”

    3. Big data is revolutionary.

    “In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,” Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.”

    4. Bigger data is better.

    “In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better. Really big datasets can be a mess…”

    5. Big data means the end of scientific theories.

    Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.

    But you can’t just go fishing for correlations and hope they will explain the world

    * Full article available here.


  • #179750

    Mark Hammer

    It’s no great secret that a great deal of research is annoyingly impeded from providing more insight and stronger inferences by the lack of data; i.e., low statistical power. It is also no great secret that that a great many things happen for reasons we never suspected, because we could never “drill down” that far, and large corpi of data can sometimes reveal those informative patterns.

    However, facilitating and furnishing insight are two different things. It is not the case that larger bodies of data will necessarily provide insight and stronger inferences by itself, or that the patterns that spuriously emerge from great masses of data are necessarily meaningful. With great statistical power comes great statistical responsibility: it is the data-user that must possess some glimmer of anticipation in order to see the patterns, and also the data-user that must distinguish between “statistical significance” and meaningfulness/relevance.

    That’s one of the traps of massive data-sets: even things that predict .01% of the variance can end up being hugely “significant”. Of course, so many of us who may be social scientists have our statistical training predicated on datasets (whether supplied or our own) that are several orders of magnitude smaller than what we see with big data, and it is difficult to suppress our tendency to be distracted by tiny alphas, even if they are just an epiphenomenon of large datasets. IN that respect, I’m a fan of “enough” data, rather than “big” data.

    But in any event, very good article with some great points. Thanks for the heads-up, David.

  • #179748

    David B. Grinberg

    Thanks for your always illuminating comments, Mark.

    FYI, the article appeared again in today’s Washington Post Sunday Business Section. Yes, I still have a print subscription in addition to a digital one.

You must be logged in to reply to this topic.