Posts

Showing posts from 2023

The Greatest Reasons to use or not to use a De-centralized Data Management Architecture

Image
Imagine having a dancing party for your data. Everyone in harmony waltzing, stepping on their partner's toes from time to time.  Distributed data management is no less amusing. It's chaotic, occasionally hair-raising but with the right approach can even be perfect.  A central huge data warehouse is a struggle to scale efficiently and hard to innovate. There is no clear ownership of the data domains and it is a single point of failure. During peak usage times, data access and processing can be slow. Even implementing updates or upgrades can be quite complex and time-consuming. Centralized databases are attractive targets for cyberattacks and successful breaches can compromise a large amount of sensitive data.  As an alternative to a centralized Data Warehouse, data can be owned and managed by the domains, producing it. When considering a decentralized approach, we need to make sure there is a self-serve data infrastructure platform that allows different domains or teams to acces

SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean

Image
After so many years of writing SQL queries, I cannot believe I never came across a QUALIFY query clause. It feels like discovering a treasure in the depths of the database ocean. Even ChatGPT was not aware of it! QUALIFY query clause was invented by Teradata and is not a part of SQL Standart but apparently, multiple database vendors are supporting it, for instance, Snowflake and Databricks. QUALIFY clause solves the challenge when applying a window function as a query filter. SQL Standart and most database vendors will not allow to run of a window function as a part of a WHERE clause because window functions get evaluated after the HAVING clause. We typically create a CTE or subquery adding a window function to be able to filter on a it at a later query stage. Here is an example of how you can use QUALIFY to piece-of-cake filter on the window function.  I will use the Snowflake database, as an example. Let us consider that we have a list of queries and we need to find the last query ex

Data Operations Demystified: mastering 7 core principals

Image
Last week I have attended an Enterprise Data World summit. This is one of my favourite summits as they always have a broad choice of sessions on any level for data professionals. Unfortunately, due to work arrangements, I was only present on the second day of the summit. The first session's original title was “Accelerating Data Management with DataOps” but the speaker didn’t show up. Summit organizers have asked another speaker, Doug Needham from DataOps.live, to fill up and he made up a replacement session just in a few minutes before the session was supposed to start. That takes a lot of knowledge of the topic and courage to stand up for the disappointed audience that was expecting someone else. Doug is a senior data solutions architect and was doing Data Ops long before the term was invented. This amazing hour was full of exciting aspects of data operations, examples from Doug’s own rich experience and an overview of the Data Ops product his company is working on. I have very m

The Greatest Reasons to use or not to use a Centralized Data Access Architecture

Image
When developing a modern Data Platform Layer, one of the main decisions is whether to opt for centralized or decentralized data access architecture.  There is no “one-fits-all” solution, both have advantages and disadvantages. A Centralized Data Access Architecture would usually mean duplicating data from the operational layer into the analytical layer and applying various transformations to data to support and speed up data analytics. Operational online transaction processing Layer , where all microservices and their operational databases are located. Analytical Data Layer , where we would have data lakes that support data Scientists' work and a data warehouse, that supports Business Intelligence. Transformations, ETL or ELT data pipelines , which are moving data from the operational layer into the analytical layer. I f we opt for a Centralized Data Access architecture, what would be the benefits and the drawbacks? CONSISTENCY : consolidating data into a central location can

Truth is multidimensional and depends on the perspective you observe it from

Image
Just as your Power BI visualizations. I often want to build a dashboard with some sleek elegant visualizations and at the same time show as much data as possible.  Tooltip is a great feature when you want to show additional data without making visualizations huge and cumbersome. Charts in Power BI have out-of-the-box boring text tooltips.  But if we want to add a tooltip to Tables and Matrix visuals, the process is not straightforward. I have recently figured out how to build a custom tooltip in a Power BI and got so excited about it, that now I show it to just everyone willing to see it.  Today I will share it with you. My favorite dataset I will use a yellow taxi dataset . This free dataset contains billions of trips in New York City for the last 10 years and is provided by the NYC Taxi and Limousine Commission (TLC). It contains information on the pickup and dropoff locations, number of passengers, trip distance, fare amount, tip amount, etc. Using Power BI, I am loading the January