Posts

Showing posts with the label snowflake

SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean

Image
After so many years of writing SQL queries, I cannot believe I never came across a QUALIFY query clause. It feels like discovering a treasure in the depths of the database ocean. Even ChatGPT was not aware of it! QUALIFY query clause was invented by Teradata and is not a part of SQL Standart but apparently, multiple database vendors are supporting it, for instance, Snowflake and Databricks. QUALIFY clause solves the challenge when applying a window function as a query filter. SQL Standart and most database vendors will not allow to run of a window function as a part of a WHERE clause because window functions get evaluated after the HAVING clause. We typically create a CTE or subquery adding a window function to be able to filter on a it at a later query stage. Here is an example of how you can use QUALIFY to piece-of-cake filter on the window function.  I will use the Snowflake database, as an example. Let us consider that we have a list of queries and we need to find the last quer...

Doing things right or do right things? How to find row count of every table in database efficiently.

Image
One Data Engineer had to replicate data from one well-known database vendor to a less-known database vendor. He used select count(*) to validate that tables row counts were equal on the source and target. It worked so slowly, that he got fired without ever knowing whether the table's content was equal or not. Often laziness is a first step towards efficiency. Rather than doing count(*) on each table, the unfortunate DBA could have used internal statistics that every decent database vendor is maintaining, stored in the system views. Need to take into consideration that it will never be 100% accurate and will depend on a few things How often do database objects change What is the internal or manual schedule for statistics refresh. For a lot of database vendors, statistics will get refreshed automatically only when the changed data is more than 10% of the total table rows but this is usually configurable per table. The percentage of rows used to calculate the statistics. The most accu...