SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean


After so many years of writing SQL queries, I cannot believe I never came across a QUALIFY query clause. It feels like discovering a treasure in the depths of the database ocean.

Even ChatGPT was not aware of it!



QUALIFY query clause was invented by Teradata and is not a part of SQL Standart but apparently, multiple database vendors are supporting it, for instance, Snowflake and Databricks.

QUALIFY clause solves the challenge when applying a window function as a query filter. SQL Standart and most database vendors will not allow to run of a window function as a part of a WHERE clause because window functions get evaluated after the HAVING clause. We typically create a CTE or subquery adding a window function to be able to filter on a it at a later query stage.

Here is an example of how you can use QUALIFY to piece-of-cake filter on the window function. 

I will use the Snowflake database, as an example. Let us consider that we have a list of queries and we need to find the last query executed by each user. Here is an example dataset:


WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM queries
WHERE user_name IN ('Maria','Nika')


Here is how elegantly QUALIFY helps to get the last query by each user!!!

WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM queries
WHERE user_name IN ('Maria','Nika')
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_name ORDER BY query_time DESC NULLS LAST ) = 1


I find the QUALIFY clause very convenient and elegant and I wish all database vendors have supported it.

 

Here is an equivalent query using subquery:


WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM ( SELECT *,ROW_NUMBER() OVER (PARTITION BY user_name ORDER BY query_time DESC NULLS LAST ) as rn
FROM queries ) tbl
WHERE user_name IN ('Maria','Nika')
AND rn = 1


The result is the same




May all your queries sprint like caffeinated cheetahs


Yours

Maria

Comments

Popular posts from this blog

Look back and realize how far you came

The Greatest Reasons to use or not to use a Centralized Data Access Architecture