SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean
After so many years of writing SQL queries, I cannot believe I never came across a QUALIFY query clause. It feels like discovering a treasure in the depths of the database ocean.
Even ChatGPT was not aware of it!
QUALIFY query clause was invented by Teradata and is not a part of SQL Standart but apparently, multiple database vendors are supporting it, for instance, Snowflake and Databricks.
QUALIFY clause solves the challenge when applying a window function as a query filter. SQL Standart and most database vendors will not allow to run of a window function as a part of a WHERE clause because window functions get evaluated after the HAVING clause. We typically create a CTE or subquery adding a window function to be able to filter on a it at a later query stage.
Here is an example of how you can use QUALIFY to piece-of-cake filter on the window function.
I will use the Snowflake database, as an example. Let us consider that we have a list of queries and we need to find the last query executed by each user. Here is an example dataset:
WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM queries
WHERE user_name IN ('Maria','Nika')
Here is how elegantly QUALIFY helps to get the last query by each user!!!
WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM queries
WHERE user_name IN ('Maria','Nika')
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_name ORDER BY query_time DESC NULLS LAST ) = 1
I find the QUALIFY clause very convenient and elegant and I wish all database vendors have supported it.
Here is an equivalent query using subquery:
WITH queries as (SELECT *
FROM (VALUES('Maria',1,'select * from testtable;','2023-10-26 04:00:30.642 +0000'::timestamp),
('Maria',20,'select * from customers where name = ''Nike''; ','2023-10-26 05:40:30.642 +0000'),
('Nika',33,'select distinct col1 from tableB','2023-10-26 01:00:30.642 +0000'),
('Nika',56,'select distinct col1 from tableB where cola != 1','2023-10-27 04:00:30.642 +0000'),
('Nika',85,'describe testtable','2023-10-28 04:00:30.642 +0000')
) t(user_name,query_id, query_text,query_time)
)
SELECT *
FROM ( SELECT *,ROW_NUMBER() OVER (PARTITION BY user_name ORDER BY query_time DESC NULLS LAST ) as rn
FROM queries ) tbl
WHERE user_name IN ('Maria','Nika')
AND rn = 1
The result is the same
May all your queries sprint like caffeinated cheetahs
Yours
Maria
Comments
Post a Comment