Optimizing Snowflake (2) — investigating expensive queries

3 min readSep 5, 2024

Photo by Yaroslav Muzychenko on Unsplash

After being able to identify the most expensive queries, the next step is to zoom in on individual cases. In the previous article, we navigated to the overview with the most expensive queries:

From this overview, we can look at the details. Hover over a query and you will see this popup:

Click the “View Query” button from the top entry of the Most Expensive Queries. You will then go to the query history screen.

Next, click the Query Profile:

This can look quite impressive, so let’s not look at that in too much detail. The interesting thing is at the right:

Let’s try to summarize some of the conclusions we can draw from these screens:

Tablescan (partitions scanned versus partitions total) as the most expensive node is always interesting, meaning that a lot of data has to be read from storage
Bytes spilled to local storage, when the query is to big for the warehouse capacity

For those 2 particular items, we have solutions like clustering or warehouse scaling, which will be adressed in future articles.

My pragmatic approach to optimizing Snowflake is:

start with the top entry of the list of most expensive query. Easy to find this one.
start with the most expensive operations within this query. Perhaps a bit more complex, as several bottlenecks might be identified, such as the ones above.

In the next article, I will describe what to do when a Snowflake stored procedure is one of the most expensive queries. How to debug them requires some special setup.

Optimizing Snowflake (2) — investigating expensive queries

Written by Johan

No responses yet