Snowflake — LISTAGG (&Copilot)

Johan
3 min readMay 14, 2024

We have a requirement where we want to see the top customers with the largest orders. This could lead to a query like this:

select o_custkey customer
, sum(o_totalprice) total_sales
, row_number() over (order by total_sales desc) rank
from snowflake_sample_data.tpch_sf1.orders o
join snowflake_sample_data.tpch_sf1.customer c
on o.o_custkey = c.c_custkey
group by all;
Customers ranked

And we want to see the size of their last 5 orders. This makes it a bit challenging, since the granularity/grouping will change. We will end up with many records (n orders * n customers). To collect and rank the orders, we will do this:

select o_custkey customer
, o_orderkey orderk
, o_totalprice total_sales_o
, row_number() over (partition by customer order by sum(total_sales_o) desc) rank_o
from snowflake_sample_data.tpch_sf1.orders o
join snowflake_sample_data.tpch_sf1.customer c
on o.o_custkey = c.c_custkey
group by all;
Orders ranked

When we combine the 2 queries (for instance using CTE’s), the result will show multiple lines per customer. Not what we want:

--

--

Johan

I am a freelance data engineer. My main focus is Snowflake, but I am always eager to learn new things. Python and AWS are my main side interests