Subscribe to DSC Newsletter

Batch data processing is an efficient way of processing high volumes of data is where a group of transactions is collected over a period of time. Data is collected, entered, processed and then the batch results are produced (Hadoop is focused on batch data processing). Batch processing requires separate programs for input, process and output. An example is payroll and billing systems.

In contrast, real time data processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real time). Radar systems, customer services and bank ATMs are examples.

While most organizations use batch data processing, sometimes an organization needs real time data processing. Real time data processing and analytics allows an organization the ability to take immediate action for those times when acting within seconds or minutes is significant. The goal is to obtain the insight required to act prudently at the right time - which increasingly means immediately.

Complex event processing (CEP) combines data from multiple sources to detect patterns and attempt to identify either opportunities or threats. The goal is to identify significant events and respond fast. Sales leads, orders or customer service calls are examples.

Operational Intelligence (OI) uses real time data processing and CEP to gain insight into operations by running query analysis against live feeds and event data. OI is near real time analytics over operational data and provides visibility over many data sources. The goal is to obtain near real time insight using continuous analytics to allow the organization to take immediate action. Contrast this with operational business intelligence (BI) -  descriptive or historical analysis of operational data. OI real time analysis of operational data has much greater value.

For example, Rose Business Technologies designs and builds real time OI systems for our retail clients to optimize customer service processes. The ROI is improved customer satisfaction and reduced churn. OI is used to detect and remedy problems immediately - often before the customer knows of the problem.

Real time OI is used in customer service centers for customer experience optimization. Recommendation applications can assist agents in providing personalized service based on each customer's experience. An organization can collect data about customers on the phone and how they previously interacted with the organization. The goal is to analyze the total customer experience and recommend scripts or rules that guide the agent on the phone to provide an optimal customer interaction with the organization - leading to more sales, efficient problem solving and happy customers.

Rose retail clients are starting to use real time OI to detect customer buying patterns - discovering buying patterns from historical data - then monitoring customer activity to optimize the customer experience. This leads to more sales and happier customers.

Real time data processing is used by Point of Sale (POS) Systems to update inventory, provide inventory history, and sales of a particular item - allowing an organization to run payments in real time.

Assembly lines use real time processing to reduce time, cost and errors: when a certain process is competed it moves to the next process for the next step - if  errors are detected in the previous process they are easier to determine.

Real time OI can also monitor social media allowing an organization the ability to react to negative activities (e.g., tweets or posts) to mitigate effects in a timely fashion before they snowball into something ugly and potentially damaging. 

Other examples include real time retail dynamic pricing, real time supply chain management, social analytics for dynamic selling and brand management, and smart utility grid management.

In a Hadoop environment, the trick to providing near real time analysis is a scalable in-memory layer between Hadoop and CEP. Storm is an open source distributed real time computation system that processes streams of data. Storm can help with real time analytics, online machine learning, continuous computation, distributed RPC and ETL. Hadoop MapReduce processes "jobs" in batch while Storm processes streams in near real time. The idea is to reconcile real time and batch processing when dealing with large data sets. An example is detecting transaction fraud in near real time while incorporating data from the data warehouse or hadoop clusters.

Below is list of batch and real time data processing solutions:

Batch and real time data processing both have advantages and disadvantages. The decision to select the best data processing system for the specific job at hand depends on the types and sources of data and processing time needed to get the job done and create the ability to take immediate action if needed.

See: http://bit.ly/13Fi03G

Views: 116823

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Fari Payandeh on September 10, 2013 at 1:17pm

Excellent!

Comment by Ron Wolf on August 20, 2013 at 10:51am

Wow, that's a really clear architecture post.

However, seems that the topic is much more about a preferred architecture for "real time" DP than it is about batch. Just saying as with a more appropriate title it might reach a more appreciative audience.

Anyway, I can attest that the architecture that you lay out works and is becoming general accepted best of breed. My team pioneered this real time data warehouse architecture at Keynote.com in the late 1990's and the system (with many upgrades, of course) continues to provide exceptional value.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service