Modern Modular Data Pipelines Example

 I would like to share with you my favourite example of the modern data pipeline.

It's amazing.

The first cool thing that we see is that this great pipeline is utilizing a full range of cloud services built for diverse use cases. Choosing the correct tool for each use case can be one of the key factors for the success of your idea, allowing you to get things running as fast as possible without reinventing the wheel.
Another cool thing is if we want to pull data from any non-trivial data source, like Twitter or Jira or GitHub, Azure Databricks is our first friend.
However, the most noticeable advantage of this pipeline, is that instead of having a monolithic data flow, this pipeline is actually multiple pipelines that are running in parallel. Short, simple, and independent pipelines. Multiple independent pipelines can work in parallel and on different frequencies. One pipeline failure would not impact others. This is an easy way to scale each pipeline separately to speed up only specific tasks and save money on keeping other operations on lower compute.

Comments

Popular posts from this blog

SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean

Look back and realize how far you came

The Greatest Reasons to use or not to use a Centralized Data Access Architecture