|

Lowering the Barriers to Entry for Big Data

Author(s)

Ben Woo

Current Overview

For most enterprises, Big Data has moved from being just a buzzword or a science project to the recognition that Big Data can generate tactical and strategic competitive advantage as well as being a value creator for the business.

So many businesses have heeded the advice from Neuralytix with respects to Big Data:

If you’re not doing it, your competitors are!™

However, recognition does not always translate to realization. A major impediment for most enterprises is not the infrastructure. There are plenty of choices in terms of the underlying hardware technology; a variety of Hadoop distributions or NoSQL databases; and numerous technologies that help enterprise enable internal datasets to be integrated with external datasets. For most enterprises, the lack of Big Data skills is singularly the biggest impediment in embarking on the Big Data journey.

There are several ways of addressing this challenge:

  • Hire data scientists and Big Data experts;
  • Outsource the Big Data processing to service providers; or
  • Create custom code for each project.

Each of these options have its own relative pros and cons. But all of these options typically isolate data that are used with current processes, applications, and other intelligence that have been purpose built for the enterprise.

This makes Big Data not only a challenge from a skills perspective, but fiscally challenging.

Ideally, what enterprises are looking for, is an opportunity to integrate existing and perhaps even proprietary intelligence with the new datasets that Hadoop and other Big Data technologies provide.

For many enterprises, they have made considerable investments in developing and retaining relational database administrators (DBAs). Many of these DBAs have coded custom queries to support the enterprise.

So it naturally begs the question, what would be better than to protect and extend the talent that already exists in the enterprise. But unlike the unstructured data with which most Big Data experts and data scientists typically manage; DBAs and business intelligence (BI) specialists are more accustomed to dealing with traditional structured query language (SQL).

One way to achieve this, is to provide extensions into the Hadoop framework that allow existing ANSI-standard SQL to act on data within Hadoop; and to allow the framework to extend its reach beyond unstructured data into structured and relational databases.

Neuraspective™

The Hadoop framework consists of a menagerie of projects: HDFS, Hbase, Hive, Pig, Zookeeper, Map/Reduce, and R just to name a few. However, these projects do not enable enterprises to leverage the assets and resources already in place without requiring data migration from existing datasets to the Hadoop framework.

To address these challenges, EMC’s new Hadoop distribution, Pivotal HD, brings together its Greenplum  MPP database, its new “Dynamic Pipelining” technology, and the ability to perform true SQL processing for Hadoop.

Pivotal HD addresses the key barriers to entry for Big Data. It provides libraries that allow existing business intelligence, and practically any ANSI standard SQL queries to operate within the Hadoop framework.

Additionally, Pivotal HD also provides the libraries that connect tables from relational databases, such as Oracle, as an external table, and enable Hadoop operations to be performed against the data from these tables.

So what?

With the introduction of Pivotal HD, this means that enterprises have the opportunity to evolve their existing database environments into Big Data environments organically. Existing SQL compliant processes are not lost. Instead, these processes can now address any data that can be managed within the Hadoop framework.

Many enterprises think of investment protection purely in terms of hardware. However, the intellectual property investment can be as great (or in many cases, greater). Unlike hardware, which suffers from depreciation and deterioration, intellectual property such as queries, processes and concepts developed from within an enterprise appreciates in value.

By bringing the capabilities of a true SQL parallel database to Hadoop, it provides enterprises with the ability to create new potential from its existing data, information and protects the investment made in infrastructure.

[wpdm_file id=14]

Cross published at greenplum.com

Enhanced by Zemanta

Related Consulting Services

TAGS