How Many “Vs” are There for Data?


Capturing the value of ‘Big Data’ has emerged through the alliterative measures of “v.’ The growth of “v’s” usually has applied to “Big Data” but applies to all data. As in cosmology, questions arise how big “Big Data” is and can there be a limit to how much it can expand. So far, the gravity of hardware and software seem to regulate how it can be managed.   The ‘V’s” tell us what data is, but not how big it has to be to qualify. How big does data analysis become?

Three v’s comprised the seminal characteristics of the “Big Data” universe. First there were two: volume and velocity. These refer to the size of data and the speed of data change. The bigger and faster data changes, the more it is hard to store and use. Then “variety” was added addressing questions how much data can vary by data types, e.g. structured, unstructured (columnar, text, imagery) and semantically organized (RDF) data. Sometimes “geospatial” is mistaken for unstructured data, but more on that story later. Then came number four, “veracity.”  Only after hardware and software development (for example, Hadoop by Apache, MapReduce, Netezza by IBM, Exadata by Oracle, and HANA by SAP) did there grow a concern whether the resulting manipulation of data had some probability of being true. Massive parallel processing was implemented with little concern about probably truth of the results.   Occasionally, there were references to “validity” as though that guaranteed veracity. However, syllogisms can be valid without having veracity or providing new information. The syllogism: Dennis is an unmarried man; all unmarried men are bachelors; therefore qed: Dennis is a bachelor” is a tautology that is valid and true, but provides little information.

Now “Value” has been offered as a 5th ‘v.” Of course, measuring and controlling data would not be worth doing if it was not of some “business value.”   Generating personalized advertising is based on some probability that it will be of value to the business that pays for it. Another hidden assumption is that a user will be the same person with the same “likes” over time. Maybe there is a ‘Big Data” solution for calculating that probability, which would require constant surveillance of what you ‘like.”

Of course, “visualization” has always been part of the picture as another ‘v’.   This word conjures up the expectation that “seeing is believing.” The assumption is that, whether as “dashboard,” “graphs,” or “maps,” people will “intuitively” know what data means. I hate the word “intuitive” in this context. Nothing involving reason is intuitive. Corporations and professional journals tout the power of visualization with “R” or SAS or ArcGIS as though people know it when they see it. I’ve grown skeptical of dynamic BI applications with maps; in spite of my profession being to advocate for that.

Finally, I’d like to add two more v’s: verisimilitude and virtue.   The progress of data architecture assumes “data science” leads to something resembling reality or the “real world.” However it is possessed by a hidden epistemology, instead, assuming that that information, preferences, hypotheses, conclusions, and reports are entirely subjective. Creating personalization of ads or “recommendations” via “analytics” depends that assumption. Oddly, cultural relativism (all ideas are equally viable, so are all ‘likes’) and conformity (everyone ‘likes” more or less everything to be faster, thinner, trendy, etc. just as everyone else does within their “cluster” ) occur at the same time.

Most of all, there is no concern for such an old fashioned idea as virtue. A relevant definition of virtue can be summed up by Gregory Bateson’s and Jürgen Habermas’s views on communicative competence. Communication is based on norms such that people try to communicate sincerely with each other, no parties are lying, and all parties are trying to come to a consensus, if not the truth. The virtue lies in not spewing selfies disguised as a sentence.

These ‘v’s apply to all data, not just so-called ‘Big Data.’ “Analytics” has always been practiced and would not be possible without traditional logical and statistical methods. One could ask whether “Big Data,” –not unlike the universe – is infinitely expanding or will someday contract. If contracting, language and logic would be revealed as the long-forgotten foundation of the rest. This mirrors debates about the ‘inflationary’ theory of the universe; see Shelden on “Big Bang Theory” show for an explanation. My personal preference is for contraction theory.

Dennis Crow is part of the GovLoop Featured Blogger program, where we feature blog posts by government voices from all across the country (and world!). To see more Featured Blogger posts, click here.

Leave a Comment

One Comment

Leave a Reply