Over the last couple of years, the term Big Data has been bantered around to generally describe systems with the ability to process vast arrays of semi-unstructured and fully unstructured data. IT vendors and market research firms, particularly IBM and Gartner have used the three “V’s” – volume, velocity and variety to describe various attributes of Big Data. Though IDC uses the three “Vs” as well, it added a fourth for “value.”
Unfortunately, the three “V’s” offer little insight into what Big Data actually does. Yes, they describe the information that is involved with Big Data but they do not offer an understanding of its purpose of Big Data. IDC’s addition of “value” to the equation brings the entirety a lot closer to providing a meaningful description.
At Neuralytix, we believe that that it is now appropriate to refine the term of Big Data to “a set of technologies that creates strategic organization value by leveraging contextualized complete data sets.”
Like previous definitions, Neuralytix recognizes that Big Data encompasses a set of technologies; it is not, itself a technology. In an era of convergence, where hardware – compute, storage and network – and software – are often delivered in a prescribed manner, Big Data is most likely to be delivered in a predefined cluster of industry standard (commodity) x86-based servers, storage and networking infrastructure with proprietary software layered on top.
Unlike other definitions, Neuralytix permits the consideration of non-x86 platforms into the equation. This is deliberate. While the concept of Big Data obviously involves some volume of data, it does not define how much. This seems a sensible concept since “big” is a relative term. To a large multi-national conglomerate, “big” could be multiple petabytes of data; while big, in terms of a small business could be a handful of gigabytes.
More importantly, this discrepancy and wide range means that different organizations will require differing kinds and amounts of infrastructure to support its Big Data activities. The largest organizations may need more compute capacity to process the necessary data sets. While significant enhancements have been made to the x86 platform through multi-core, multi-socketed, multi-layered caching, etc. there are likely to be situations in which non-x86 platforms (such as IBM’s Power Systems) may be more appropriate and more cost effective in the value equation.
Supporting IDC’s view that “value” is critical, Neuralytix’s definition extends this to stipulate that Big Data should generate strategic value. The strategic element is significant in this case. Many traditional analytics and data warehouse processes provide operational (read: tactical) efficiencies. In short, they continuously improve existing processes to curtail cost while supporting growth. While Neuralytix’s definition emphasizes strategic value, we in no way diminish the importance of tactical value. In essence, we believe Big Data is less about refining current processes than it is about spawning opportunities for business process re-engineering that can have a long term impact on an organization’s core competitiveness.
Differing from most traditional definitions of Big Data, Neuralytix also does not put as much weight on the concept of variety of data in Big Data since leveraging the right data in the right context is ultimately what will generate strategic value. As such, the concept of contextualization is a critical one. Irrespective of the source or the type of data, strategic value can only generated with the most appropriate data sets. In some cases, a data set may consist of a homogeneous stream of data. In other cases, heterogeneous sets of data may be required. Either way, the cross-referencing of multiple data sets is a basic aspect of contextualizing data. Data has no context without references to other data. Data that persists independent of other data is useless.
Finally, Neuralytix’s Big Data definition specifies “complete data sets” and is perhaps the most differentiated aspect of our approach. Historically, a lack of computational, storage or networking capacity, performance or cost have restricted data analytics, search and discovery to summarized or sampled data sets. Summarized or sampled data sets, by definition are skewed. With the affordability, availability, and accessibility of vast arrays of computational, storage and networking capacity, organizations can now perform analytics, search and discovery on complete data sets.
Ultimately, Neuralytix believes that Big Data is less about what type of data and the amount of data. Instead, Big Data is about creating organizational value – value that can be derived by putting data into context.
Updated July 9, 2012 with thanks to Charles King, Pund-IT