Posts

Showing posts from October, 2022

Modern Modular Data Pipelines Example

Image
  I would like to share with you my favourite example of the modern data pipeline. It's amazing. The first cool thing  that we see is that this great pipeline is utilizing a full range of cloud services built for diverse use cases. Choosing the correct tool for each use case can be one of the key factors for the success of your idea, allowing you to get things running as fast as possible without reinventing the wheel. Another cool thing is if we want to pull data from any non-trivial data source, like Twitter or Jira or GitHub, Azure Databricks is our first friend. However, t he most noticeable advantage  of this pipeline, is that instead of having a monolithic data flow, this pipeline is actually multiple pipelines that are running in parallel. Short, simple, and independent pipelines. Multiple independent pipelines can work in parallel and on different frequencies. One pipeline failure would not impact others. This is an easy way to scale each pipeline separately to speed up only

When should you use Azure Databricks?

Image
  Once upon a time, Sql Server was our central tool for data management, for both OLTP (online transactions processing) or OLAP(online analytical processing) database systems. We have used Sql Server Agent Jobs to pull the data from FTP or any other source. We have used Sql Server stored procedures to pull the data into the Staging database. We have used Sql Server stored procedures to enrich and aggregate the data. And we have used Sql Server as a data serving layer.  These days we need to consider utilizing various cloud services. A ttempts to lift and shift existing systems into the cloud often end up being quite expensive if we tend to keep Sql Server taking care of all data pipeline stages.  There are multiple great services in the Azure cloud and Microsoft tends to build each product with multiple features allowing it to take care of multiple pipeline stages. This does not mean that we need to go back to the monolith architecture, let's find out where each service fits. Data