28 Dec Data Streaming vs Batch Processing
Large enterprises have data scattered across different systems, it can be ERP, SAP, Salesforce and so on. To take key decisions to grow business or cater customer in efficient manner enterprises always needed to have single unified view of data instead of going through each system to get information.
To have unified view of data, enterprises create Data warehouse solution which will have data from across systems and top on that Data Analytics will created to give information needed by business to take key decision.
Now, the question arises, what is this Data warehouse and how to get data from across systems to Data warehouse?
What is Data Warehouse?
Data warehouse is kind of system which can be used to run analytical queries faster to get insights from the data. Now to get data from different systems to data warehouse, the data engineers have two ways to implement this as follows: –
- Batch Processing
- Data Streaming
Now which one of the above is to be used when? Well, it all depends on business use case. Many times, enterprises use both simultaneously.
Batch processing deals with non-continuous data. It means processing chunk of data in the predefined interval. The interval can be hourly, daily, weekly and so on. It lacks to get near real time data from source systems. As batch processing will be processing large volume of data so this process takes longer time to get data from sources to destinations. Batch processing well suited for uses cases such as getting monthly payroll information and many more.
In the other hand, Stream processing handles data in motion. Because of this it’s always be near to real time with source systems. It will be useful for use cases where business would like to be proactive to address customers issue. The use cases can be thought of Fraud Detection, IoT sensors, Ecommerce websites and many more.