In 2001, Doug Laney of META Group (now Gartner) wrote a report identifying three Vs for Big Data which have since been universally adopted: Volume, Velocity and Variety. Volume is the most basic and probably most widely thought of component and it basically says that Big Data consists of large amounts of data. But it's also coming at you faster than ever before (Velocity) and from a wider range of sources (Variety). Since then other Vs have been added by some groups.
IBM apparently identified Veracity as an important factor at the IBM Information on Demand conference in October 2012. The idea is that in order to use Big Data it must be correct whereas as data sources grow it becomes more noisy and less trusted. But as the blog post about it points out:
"This dimension of Big Data has less to do with the inherent characteristics of the data – but with how it needs to be used."It seems that the company PROS has coined two new Vs: Viability and Value. In practice what they mean is that once data has been gathered it must be processed to find those bits that are actually useful to make a specified prediction (what they call Viability) and then models must be produced to generate Value. In reality these are not characteristics of the data but are things to do with it. Moreover, it seems to me that Viability and Value are just new terms for Feature Selection/Extraction and Machine Learning.
Doug Laney, who originally coined the first three Vs left the following comment on the Wired article:
"...However, as clever as people may be in thinking-up superfluous "V"s for Big Data, we contend that they are not definitional. Increasing volume, variety and velocity are the characteristics of Big Data, whereas value, veracity, verisimilitude and velociraptors or whatever are not. Yes, some are important aspects of all data but are not at all definitional characteristics of Big Data."He is, of course, correct that what we need are only Vs that differentiate between Big Data and other forms of data and which in some way explain why it's a challenge. Vs that just describe what steps need to be taken to make the data usable or valuable are not defining the data itself.
No comments:
Post a Comment