Enterprises have a lot of data. This data is generally distributed over many data stores – relational databases (RDBMS), enterprise data warehouses (EDW), or even the increasingly popular Hadoop framework.
However, What they do with the data, and how they leverage that data could mean the difference between being a market leader or a laggard.
In this White Paper, Neuralytix examines the challenges enterprises face when they try to integrate all these datasets for analytic purposes. Appropriate analysis can create strategic business value and competitive advantage resulting in dramatically improved revenue, better managed costs and the associated improvement in profits.
But to achieve these advantages, enterprises must be able to process, analyze and visualize this data quickly – timeliness is the key. Traditional approaches are encumbered by the exponential increases in data; and the overhead associated with integrating datasets, analyzing datasets and efficiently producing consumable information that enables more informed decision making. They must also be able to do this in an ad hoc manner to take advantage of opportunistic circumstances.
Neuralytix believes that Kognitio, with its Kognitio Analytical Platform provides a cost effective way of reducing the time-to-insight for enterprises
Business Intelligence (BI) and Business Analytics (BA) software has been around for quite some time. Equally, Enterprise Data Warehousing (EDW) has been the central data workhorse for enterprises for many years for the analysis of large sets. However, as the size of datasets and number of those sets, as well as the various repositories in which data resides, gaining timely access and analysis of the data for more and more users becomes increasingly complex. This complexity translates into an increase in the time-to-insight of the enterprise, a form of data systems friction that slows down potential innovation and competitive advantages.
Big Data (defined as a set of technologies that creates strategic organization value by leveraging contextualized complete datasets) helps to ease both time-to-value and complexity by being able to curate through and process full-detail datasets. The biggest challenge with using Big Data is that, in many cases, a new infrastructure typically has to be architected, deployed, and data imported or ingested into the new environment before any analysis can even begin. All of these activities often translate into an increase in the time-to-insight for enterprise users. Again, this increase is likely to slow innovation, and disrupt potential competitive, economic or efficiency advantages.
Ideally, the preferred method is to leave data in its native location. Minimizing data migration and extraction maintains not only the source integrity of the data, saving time and effort, but also provides the best opportunity to work with the most recent data without introducing costly processes and assumptions. All that is required is an ability to reach in and take data as and when required without the need for complex ETL. Avoiding complex manipulations (including any form of sampling, e.g., abstraction through assumptions) helps reduce the risk of sub-optimal data. Sub-optimal data restricts the potential value derived from the analytics process.
Leaving data in its native location saves time and effort, while maintaining the purity and integrity of the source datasets. The most profound, and the most informed and reliable discoveries can only be made possible when enterprises bring together and combine these datasets; some datasets in combination multiply the value of each to the business.
Once all the desired (and in some cases, extraneous) data can be co-located, analysis can then be quickly developed, tested, executed and verified. This needs to be done with minimal effort and steps and without extensive involvement of DBAs and programmers. Performance must be simply on-tap without recourse to the complexities of indexing, partitioning, cubing, etc. End-users with standard business applications must be able to readily use and exploit the combined data. The goal is to co-locate the data with lots of low-cost computing power – let the CPUs do the work, not the users. The trick is to remove the slowest component – the disks! The datasets involved with these queries are generally very large. Paging and swapping data on and off disk based systems introduce latency and delays into the process, and conflict with the time-to-insight objective.
The temporary home for these combined datasets is an analytical platform, an engine that can simply acquire, pull the data in, and support complex queries and analytics. It is not a store, more a workspace where difficult and/or demanding analysis can be completed without impacting core systems like the Data Warehouse. This platform will leverage not only significant computing resources, but must be able to import substantial data sets of at least dozens of terabytes, then also execute the entire query or analytics process in-memory, thus avoiding the overhead of constantly reading and writing data to disk subsystems. This, of course, requires a massively parallel processing (MPP) environment, enabling users to “pin” the data to be queried into the system’s memory for analysis.
Neuralytix believes that a solution that maximizes the business value and enables the delivery of innovative competitive advantages includes:
- Leveraging data with minimal change;
- Combining data from silos to enrich the analytical workspace;
- Increasing tactical on-demand analysis to improve quality of strategic insight;
- Exploiting the low-cost multi-core, multi-CPU industry standard servers; and
- Executing queries in-memory where the CPUs are never held back.
Smaller enterprises still need Big Data
While large enterprises may be able to justify an investment in an on-premise accelerated analytical platform, smaller enterprises increasingly also benefit from Big Data analytics. When it comes to analytics, the major difference between large and small enterprises generally differ in scale of budget not function.
Neuralytix is aware of the emergence of a number of smaller enterprises whose value proposition is built around the third-party analysis of large amounts of data. While not Big Data as such, at least in the context of the petabytes of information owned by some enterprises, this information is key to their very existence. Their ability to rapidly analyze data for their clients is at the core of their success. As early-stage companies, they are frequently less likely to be in a financial position or have the need to invest in, and own their own analytical platform or to deploy their own EDW. For these enterprises, using a cloud-based analytical platform can provide an optimal way of benefiting from Big Data strategies, and sustain cost efficiency at the same time.
THE KOGNITIO ANALYTICAL PLATFORM
The Kognitio Analytical Platform is described as “a massively parallel, in-memory analytical engine.” The engine is complimentary to existing infrastructure. With more than 20 years of development, the Kognitio Analytical Platform is a highly matured solution to analytical problems.
Most recently, Kognitio has taken its platform and integrated it into an in-cloud offering, simply named Kognitio Cloud. With this, Kognitio now offers its clients and partners the flexibility of performing analytics using on-premise infrastructure, or in-cloud. On-premise infrastructure is where the end-user customer acquires the necessary hardware infrastructure along with the Kognitio Analytical Platform software and integrates it together within the “four walls” of its organization. An in-cloud implementation is where the hardware infrastructure, and the Kognitio Analytical Platform is physically housed in modern secure data center (i.e., in the cloud) and the process is offered “as-a-Service.”
The Kognitio Analytical Platform uses industry-standard x86 servers. This helps minimize the infrastructure investment necessary. The platform also has a range of connectors, of which one that is particularly relevant today is the one for Hadoop. Linking directly to data with agents embedded in Hadoop, the Kognitio Analytical Platform delivers analytics on rich data volumes, stored across a variety of data sources. This “external tables“ capability, coupled with its Hadoop connector, enables companies to reach into discrete volumes to rapidly gather the information needed to perform a given analytical query. In doing so, Kognitio has slashed the time needed to perform this function (a shortcoming in the current iteration of Hadoop), thus speeding response time, and restoring that time frame to one more closely associated with EDW performance. By using an in-memory design, Kognitio has the necessary performance to deal with changes in dataset and associated analytics with minimal fuss and effort, manageable directly by the business user level.
The Kognitio Analytical Platform gives business users who are familiar of using standard business applications and visualization tools the ability to continue using them, while enabling them to access data from its native location more easily with performance that is comparable to having data local to the end user.
RDBMS (relational database management systems) typically provide two critical functions: storing data and processing data. Data persistence has tended to be disk-based. As noted earlier, disk-based storage tends to slow down the overall performance of an analytical system. Kognitio solves this problem through its row-based, shared-nothing, massively parallel processing (MPP) in-memory RDBMS specialized for analytics. It does not require complex ETL processes. It brings row-based data directly into memory (DRAM) from a range of external sources via simple connectors and processes it in-memory – any fixing of the data can be executed on the fly. By leveraging an in-memory architecture, Kognitio is able to avoid slow, spinning disk and avoid the I/O contention issues that often accompany traditional hard disk drive (HDD) based data stores.
Since Kognitio’s platform is designed specifically for analytics, it is superior to conventional RDBMS that have architectural limitations that prevent them from doing truly parallel scalable analytical processing on Big Data sets in real time. This includes extensive and cohesive SQL and no-SQL processing.
By using data in its most natural form, the Kognitio platform also avoids the administrative and software overhead of data manipulation, difficulty in programming, data schema management, etc. Reducing overhead, latencies, the number of people involved, minimizing the whole process between the raw data and the business user is ultimately the key to creating competitive advantage and improving business value in a timely fashion.
Figure 1: Kognitio Analytical Platform
Source: Kognitio, 2012
neuralytix™ perspective & Business Value Assessment
In the information age in which we live, making decisions quickly is paramount to creating and sustaining competitive advantage. Business leaders have the tools and the desire to engage in business analytics at their own desks.
Current business processes are neither timely nor do they fully reflect a business leader’s vision. No longer is the business user dealing with a local Excel spreadsheet. The scale is different. Business users are now able to interact with very large and distributed data sources. Users are now empowered to make better, more informed and more accurate decisions more quickly.
Neuralytix believes that an optimal analytical platform must be:
- Elastic and scalable to take into account changing data and performance requirements of an enterprise;
- Capable of leveraging industry standard x86 hardware, so that there is no lock-in, in terms of hardware; and
- Be accessible from anywhere within an organization (on-premise) or from outside the organization (in-cloud).
This combination would allow organizations to abstract data from any data source, and optimize performance for data consumers.
With its in-memory MPP approach, Kognitio has delivered solid benefits to clients using its system for more than a generation. While, in the past, cost constraints sharply limited the number of companies that could take advantage of in-memory processing, those constraints have been largely wiped out today. The result is that more companies than ever can access the advantages of performing high-end data analytics. In fact, as noted above, there has been a significant growth of early-stage companies whose entire business proposition is based around the rapid analysis of Big Data sets.
Kognitio has been among the market leaders in positioning its offerings to help these companies succeed. Companies around the world have successfully deployed analytical environments through Kognitio Cloud in a fraction of the time and cost it would have taken them to do it themselves with more traditional EDW settings.
In addition, Neuralytix believes that Kognitio has already begun preparing for the emergence of non-volatile RAM (NVRAM) into mainstream computing environments, which will enable companies to keep Big Data sets “pinned” in-memory, and not require them to dump the data when machines are rebooted or additional processing capability is required.
Finally, Kognitio appears to have solidly thought out its Hadoop strategy, through its implementation of external tables and a connector, to enable fast import of data from multiple sources. This solves a “here and now” challenge for companies seeking to deploy a Hadoop-based environment, yet wanting to speed the time to response from the initial query. Future iterations of Hadoop, we believe, will address this challenge; Kognitio has delivered a solution that works today.
The bottom line is that for companies seeking to extend their Big Data capabilities, or those seeking to establish them, Kognitio offers a range of cost-effective and practical solutions that they should consider, for their immediate needs and as they grow.