Posts

Showing posts with the label data engineering

Are you familiar with DATAIKU?

Image
If you want to make DATA a part of EVERYDAY decision-making, then you must try this amazing Data Analysis Platform. Dataiku is a tool for everyone, it has Notebooks and Python for Coders, Visual data flows for Clickers, relationships, statistics and visual data forecasting for Decision Makers. It's technology agnostic, you can install it on a public cloud, use it as SaaS service or install on-premises. You also can choose ANY DATA PROCESSING ENGINE that will process your workload, use Azure Synapse, Spark or Sql Server and analyze the data WITHOUT ANY DATA MOVEMENT, in "a spreadsheet" like manner.  Dataiku has many enterprise-scale features, like build-in flow audit, Data Quality features, easy deployments between Dataiku environments and much more. https://www.dataiku.com/

Modern Modular Data Pipelines Example

Image
  I would like to share with you my favourite example of the modern data pipeline. It's amazing. The first cool thing  that we see is that this great pipeline is utilizing a full range of cloud services built for diverse use cases. Choosing the correct tool for each use case can be one of the key factors for the success of your idea, allowing you to get things running as fast as possible without reinventing the wheel. Another cool thing is if we want to pull data from any non-trivial data source, like Twitter or Jira or GitHub, Azure Databricks is our first friend. However, t he most noticeable advantage  of this pipeline, is that instead of having a monolithic data flow, this pipeline is actually multiple pipelines that are running in parallel. Short, simple, and independent pipelines. Multiple independent pipelines can work in parallel and on different frequencies. One pipeline failure would not impact others. This is an easy way to scale each pipeline separately to spe...