We hear the term “real-time” a lot. But “real-time” can mean everything from instantaneous to several seconds or even minutes. The industry has not properly defined a real timeliness of the term “real-time”. This can make it very confusing for business users. If you’re in the financial industry, even sub-second “real-time” could mean the difference between a profit or a loss.
A lot of the difference in “real-time” comes not in how we perceive “real-time”, but how we process “real-time”.
Some “real-time” systems collate data and acts on it based on a sample size. A good example of this may be a utilities company. It collects statuses from smartmeters within a community, and when enough smartmeters report an outage, then the given area may be deemed to have an outage. Similarly, some fraud detection systems work this way – looking at groups of transactions and generates a pattern that can be matched against known fraudulent activity resulting in those transactions (and similar transactions) being marked for further investigation.
On the other hand, some “real-time” systems have to react immediately to an event. A trading system may react to any form of a network delay (or increased latency), and deploy remedial processes to ensure that the system has maximum bandwidth so that trades can be completed within minimal time.
The challenge with either approach is that the edge “real-time” systems and the core datawarehousing systems may not be compatible in terms of schemas, messaging or even updating policies. A core system may update its machine-learned patterns on a daily basis, while an edge system may be updating its machine-learned patterns hourly. Also, the core (enterprise) datawarehouse may be running a relational database, while the edge systems may run a non-relational database that requires an ETL process to update the core system.
All of these processes adds latency. All of these processes changes the real-timeliness of a “real-time” system.
So, in order to optimize (and the term optimize is used here, with no specific context, as each business and/or process will have its own concept of optimization), it is necessary to have data integration at the edge system, at the core system, and between the edge and core systems.
Recently, Informatica, best known for its data integration solutions for core systems, released its real-time Big Data capabilities. Using a combination of its products including RulePoint CEP, Ultra Messaging, B2B Data Exchange, Vibe Data Stream, PowerExchange and PowerCenter, Informatica has adapted its solutions to enable data integration for streaming data analytics.
In essence, it allows edge and core systems to interchange data seamlessly. It allows data to flow from edge systems performing streaming analytics to core systems that provide historic, strategic and predictive analytics across a wide number of datasets. In developing these solutions, Informatica has focused on reducing latency. Informatica claims to be able to deal with a desired latency spectrum that ranges from 50ns through 500ms.
Users can now have a consistent interface for data integration, event monitoring and policy enforcement all the way from the point of data capture through to archive. By allowing business users to take a self-service approach to developing rules, Informatica has remove the need to learn to code.
“Real-time” however it is defined by a business needs the ability to create transparency and more importantly standardization. This goes for infrastructure through to the data. With “real-time” systems increasing playing a role in generating competitive advantage and customer satisfaction, the need for data integration that starts at the edge, and is consistent through to the core datawarehouse becomes critical for any business that understands the value of data.