Optimizing Snowflake (2) — investigating expensive queries

Johan
3 min readSep 5, 2024

--

Photo by Yaroslav Muzychenko on Unsplash

After being able to identify the most expensive queries, the next step is to zoom in on individual cases. In the previous article, we navigated to the overview with the most expensive queries:

From this overview, we can look at the details. Hover over a query and you will see this popup:

Click the “View Query” button from the top entry of the Most Expensive Queries. You will then go to the query history screen.

Next, click the Query Profile:

This can look quite impressive, so let’s not look at that in too much detail. The interesting thing is at the right:

Let’s try to summarize some of the conclusions we can draw from these screens:

  • Tablescan (partitions scanned versus partitions total) as the most expensive node is always interesting, meaning that a lot of data has to be read from storage
  • Bytes spilled to local storage, when the query is to big for the warehouse capacity

For those 2 particular items, we have solutions like clustering or warehouse scaling, which will be adressed in future articles.

My pragmatic approach to optimizing Snowflake is:

  • start with the top entry of the list of most expensive query. Easy to find this one.
  • start with the most expensive operations within this query. Perhaps a bit more complex, as several bottlenecks might be identified, such as the ones above.

In the next article, I will describe what to do when a Snowflake stored procedure is one of the most expensive queries. How to debug them requires some special setup.

--

--

Johan

I am a freelance data engineer. My main focus is Snowflake, but I am always eager to learn new things. Python and AWS are my main side interests