How would you do dynamic pivot SQL in Amazon Redshift? (This elegant script might be used in many other databases as well)

- November 07, 2017

Several days ago I have spent a few hours of my life figuring out how to do dynamic pivot in Amazon Redshift. To tell you the truth, I have expected much more from this DBMS SQL language.

Redshift is based on Postgre SQL 8.0.2 ( which was released in 2005 !!!! )

Anything you would want for this, not too difficult task, does not exits. No stored procedures. No JSON datatype. No variables outside of UDF, no queries inside UDFs. “UDF can be used to calculate values but cannot be used to call SQL functions”. Python UDFs also cannot query the data, only perform calculations.

Finally I have found one useful function LISTAGG that helped me to get distinct values of all pivoted columns.

For each group in a query, the LISTAGG aggregate function orders the rows for that group according to the ORDER BY expression, then concatenates the values into a single string.

LISTAGG( [DISTINCT] aggregate_expression [, 'delimiter' ] )

[ WITHIN GROUP (ORDER BY order_list) ]

As a next step, I have used Python UDF to dynamically build pivoting query ( I love Python more and more, this is such a nice language :-) . In addition, I have used Common Table Expression CTE ( yes, they have it! ) to pass a list of pivoted columns to Python UDF.

The last step, execution the SQL query, I am still trying to figure out – there is no dynamic query execution in Redshift SQL dialect. But my customer can execute this query from reporting tool.

Here is a setup and all queries.

Creating a table:

create table maria_deleteme ( id int, yearint, customer varchar, amount int);

Loading some test values:

insert into maria_deleteme values

( 12, 2010, 'customer1', 4),

( 2, 2010,'customer1', 80),

( 3, 2010,'customer2', 41),

( 4, 2011,'customer1', 45),

( 5, 2011,'customer2', 15),

( 6, 2011,'customer3', 18),

( 7, 2012,'customer1', 23),

( 8, 2012,'customer1', 1),

( 9, 2012,'customer1', 8),

( 10, 2012,'customer3', 98),

( 11, 2013,'customer2', 1);

This query gives us a list of pivoted columns:

select listagg(distinct year, ',') as years

from maria_deleteme

The result will be : 2011,2013,2012,2010

Using CTE over the above query and passing its result to Python UDF:

with vars

as (

select listagg(distinctyear, ',') as years

from maria_deleteme

)

select maria_pivoting ('maria_deleteme',years,'year','customer','amount')

from vars;

The above query will print out this:

select sum (casewhenyear = 2010 then amount else0end) as"2010" ,

sum (casewhenyear = 2012 then amount else0end) as"2012" ,

sum (casewhenyear = 2011 then amount else0end) as"2011" ,

sum (casewhenyear = 2013 then amount else0end) as"2013" ,

customer

from maria_deleteme groupby customer ;

Which results in:

#2010 2012 2011 2013 customer

1098180 customer3

2410151 customer2

38832450 customer1

Python UDF:

DROP FUNCTION maria_pivoting (varchar,varchar,varchar,varchar)

CREATE FUNCTION maria_pivoting(tablename varchar, list_vals varchar, pivot_col varchar, groupby_col varchar, counted_col varchar )

RETURNS varchar(4000) IMMUTABLE AS $$

vals = list_vals.split(",")

pivot_query = " select "

for value in vals:

pivot_query = pivot_query + ' sum (case when {} = {} then {} else 0 end) as "{}" ,'.format(pivot_col,str(value),counted_col,str(value))

pivot_query = pivot_query + ' {} from {} group by {} ;'.format(groupby_col,tablename,groupby_col)

return pivot_query

$$ LANGUAGE plpythonu;

I will be glad to hear what do you think about Redshift SQL language ( Of course I know that this is very powerful database for data processing. And VERY expensive.)

Yours, Maria

Comments

Mani BharatarajuFebruary 24, 2020 at 3:55 AM
Wow great one!
ReplyDelete
Replies

Add comment

Search This Blog

Maria's journey in the data fields

How would you do dynamic pivot SQL in Amazon Redshift? (This elegant script might be used in many other databases as well)

Comments

Post a Comment

Popular posts from this blog

Unlocking Microsoft Fabric: A Simple Guide when you only have a personal account.

The backbone your data pipelines have been waiting for.

Understanding the Pillars of Data Quality