Modern Modular Data Pipelines Example

- October 25, 2022

I would like to share with you my favourite example of the modern data pipeline.

It's amazing.

The first cool thing that we see is that this great pipeline is utilizing a full range of cloud services built for diverse use cases. Choosing the correct tool for each use case can be one of the key factors for the success of your idea, allowing you to get things running as fast as possible without reinventing the wheel.
Another cool thing is if we want to pull data from any non-trivial data source, like Twitter or Jira or GitHub, Azure Databricks is our first friend.
However, the most noticeable advantage of this pipeline, is that instead of having a monolithic data flow, this pipeline is actually multiple pipelines that are running in parallel. Short, simple, and independent pipelines. Multiple independent pipelines can work in parallel and on different frequencies. One pipeline failure would not impact others. This is an easy way to scale each pipeline separately to speed up only specific tasks and save money on keeping other operations on lower compute.

Pipeline Image Source: https://devblogs.microsoft.com/cse/2018/12/12/databricks-ci-cd-pipeline-using-travis/

Search This Blog

Maria's journey in the data fields

Modern Modular Data Pipelines Example

Comments

Post a Comment

Popular posts from this blog

Unlocking Microsoft Fabric: A Simple Guide when you only have a personal account.

Understanding the Pillars of Data Quality

Snowflake integration with Microsoft Azure Open AI service