SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean


After so many years of writing SQL queries, I cannot believe I never came across a QUALIFY query clause. It feels like discovering a treasure in the depths of the database ocean.

Even ChatGPT was not aware of it!



QUALIFY query clause was invented by Teradata and is not a part of SQL Standart but apparently, multiple database vendors are supporting it, for instance, Snowflake and Databricks.

QUALIFY clause solves the challenge when applying a window function as a query filter. SQL Standart and most database vendors will not allow to run of a window function as a part of a WHERE clause because window functions get evaluated after the HAVING clause. We typically create a CTE or subquery adding a window function to be able to filter on a it at a later query stage.

Here is an example of how you can use QUALIFY to piece-of-cake filter on the window function. 

I will use the Snowflake database, as an example. Let us consider that we have a list of queries and we need to find the last query executed by each user. Here is an example dataset:


WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM queries
WHERE user_name IN ('Maria','Nika')


Here is how elegantly QUALIFY helps to get the last query by each user!!!

WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM queries
WHERE user_name IN ('Maria','Nika')
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_name ORDER BY query_time DESC NULLS LAST ) = 1


I find the QUALIFY clause very convenient and elegant and I wish all database vendors have supported it.

 

Here is an equivalent query using subquery:


WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM ( SELECT *,ROW_NUMBER() OVER (PARTITION BY user_name ORDER BY query_time DESC NULLS LAST ) as rn
FROM queries ) tbl
WHERE user_name IN ('Maria','Nika')
AND rn = 1


The result is the same




May all your queries sprint like caffeinated cheetahs


Yours

Maria

Comments

Popular posts from this blog

Unlocking Microsoft Fabric: A Simple Guide when you only have a personal account.

ETL to ELT journey: Break free your Transformations and discover Happiness and Data Zen

The backbone your data pipelines have been waiting for.