In this Insight, Neuralytix analyzes the Strata + Hadoop World 2016 conference held in New York, NY from September 27-29, 2016.
This year’s Strata + Hadoop World 2016 conference had a distinct note of maturity about it, that had not been witnessed in previous conferences. Speakers, exhibitors and attendees alike no longer talked about the virtues of Hadoop and Big Data, but instead, started to quantify the human and economic value of Big Data.
Mike Olsen from Cloudera, in his keynote, highlighted Thorn, an organization that uses digital technology (Hadoop in particular) to fight child sexual exploitation. Olsen also mentioned that Cloudera has approved the first set of applications received through its own Cloudera Precision Medicine Initiative in support of President Obama’s national Precision Medicine Initiative to advance the use of data and analytics in precision medicine. Successes such as these are important, as they are prime examples of the maturity of the Hadoop ecosystem.
The Hadoop ecosystem today, is no longer discussed as a hip or cool technology, worthy of checking out because of the buzz; instead, it is a serious, maturing set of technologies that drive new value in the digital age.
There were three distinct, and consistent messages that came out from this conference: streaming, security and scale.
While previous conferences focused heavily on turning passive data warehouses to active ones, as well as focusing on analyses of aggregate data, Apache Spark and Kafka were the two consistent projects mentioned and discussed by all.
Both of these technologies, and their commercial counterparts (Databricks sells a commercial license of Spark, and Confluent sells a commercial license of Kafka), are technologies used for streaming data. These data include social media data, as well as sensor and other triggered data inputs.
The discussions in the conference talked about how technologies like Spark and Kafka can result in edge analyses that can react quicker to triggers and improve value and competitive advantage to customers.
While turning passive data warehouses into an active one can generate value at a broader level, streaming technologies allow value to be gained in near real-time at the edge of the Hadoop ecosystem (i.e. closest to the customer).
Streaming analyses do not replace aggregate analyses. Both play a critical role in the value and innovation creation process. The availability of Spark and Kafka as open source projects have opened up a new world to organizations of all sizes.
In one discussion with an automotive industry customer who recently implemented Kafka, edge analysis is able to improve customer satisfaction dramatically, and optimizes the utilization of the inventory available – both of which results in improved economic value to the customer.
The second indicator of maturity is in the area of security, especially cybersecurity. Hadoop is used in many use cases that involve sensitive personal information. At the conference, Cloudera and Intel announced that they have donated a new open source project named Apache Spot to the Apache Software Foundation with a focus on using big data analytics and machine learning for cybersecurity.
Olsen announced to the audience at the conference that “the idea [behind Spot] is, let’s create a common data model that any application developer can take advantage of to bring new analytic capabilities to bear on cybersecurity problems.”
There was extensive discussion as well as numerous vendors focused on securing the Hadoop ecosystem as well.
By far, the best indicator of the maturity of the Hadoop Big Data ecosystem is the scale at which it is being deployed. Many customers have already deployed clusters that have hundreds of nodes. With two to three years of semi-production to production implementations behind them, many customers are being asked, what was gained from Hadoop, and in Neuralytix’s opinion, the majority of customers can easily quantify the benefit and value derived from Big Data.
The next challenge for the Hadoop ecosystem is not “what can Hadoop do,” but “what else can Hadoop do?”
The Strata + Hadoop World 2016 conference is a true indicator of where Hadoop stands today. There were more exhibitors than ever before, the largest attendance to date, and there were only a few “tire-kickers.” By far, the majority of the attendees were there to find out how they can expand upon their existing implementations.
Our opinion is that the hype and buzz around Big Data (and by extension, Hadoop) has eased. But the conversations that are taking place are more serious and deliberate. Streaming will take Hadoop to the next level of buzz and hype, but in a more measured and mature way. At the same time, streaming opens up the opportunity for customers to truly become inspirational with how and what they do with data. While larger companies tend to have data warehouses, all sized companies have the opportunity now to accelerate their time-to-market, time-to-resolve, and ultimately time-to-innovation through streaming.
Projects such as Apache Spot enhance the maturity conversation by tackling the big issues around security with respects to using Hadoop in a highly insecure Internet context.
Neuralytix is excited about the next exponential growth curve for the Hadoop ecosystem. The major challenge we see is the highly fragmented market. With hundreds of vendors, many of whom have superior point solution, it still makes Hadoop a less than straightforward technology to deploy.
Between now and 2020, Neuralytix believes there will be significant consolidation of the market. Larger vendors are likely to consume smaller vendors in an attempt to produce an optimal software stack for various vertical applications.
What is clear from this conference is that legacy data is insufficient for the digital age; but simply using the latest data, without integrating legacy data is deficient too. Orthogonally, customers cannot ignore the role the cloud plays when integrating with on-premise corporate data. In other words, the maturation that is currently ongoing with Hadoop results from the recognition that the greatest value is derived from balancing old and new data; and on-premise and in-cloud sources.
Neuralytix believes that this will be the beginning of a great many innovations in the way we interact with data, and the value that can be derived from doing so.