Optimizing Snowflake (1)— overview & strategy

Johan
3 min readJul 15, 2024

--

This is the first part of a new series of articles I will be posting here: how to optimize Snowflake for cost and performance.

Image created using Copilot: “can you create a picture of a person saving money using his computer”

Let’s first start with two strategies that you can follow here:

  • use third party services to help you optimize Snowflake. Options here are Snowflake native apps or ‘external’ tools.
  • build your own expertise using some of Snowflake’s out of the box options

The alternatives for the ‘third party’ strategy can be those:

I will work on reviews of some of those third party options in the near future. Please leave a comment when I missed some tools.

Let’s now dive into the other strategy, a DIY approach to optimizing Snowflake. Following this strategy means that you will use tools that are available out of the Snowflake-box, so no external tooling. This also means that you need to invest in building up knowledge about this, so you need to understand:

  • how to identify expensive queries
  • how to do a root cause analysis
  • how to implement a fix

These steps have to be repeated over and over for each bottleneck you identify.

The first step to set is to identify which queries are the most expensive ones. Snowflake determines this by taking the average execution time and the number of runs into account. The queries are grouped using the query hash.

We can use some very interesting out-of-the box features from Snowflake to identify those expensive queries:

Go to Admin/Cost Management/Account Overview in Snowsight, you will see an overview of the “most expensive queries”: average query execution time and the number of runs within the timeframe:

You can choose all those date ranges but typically when you are in the optimization phase, only the last couple of days will be relevant, assuming that the heavy queries you found in the past, are now optimized and therefore have gone from the “most expensive queries” overview.

But, what if you try the above and get this message:

Let’s assume that ACCOUNTADMIN is not available, we need to request some grants:

grant database role snowflake.USAGE_VIEWER to role public;
grant database role snowflake.GOVERNANCE_VIEWER to role public;

With these grants, you can have a look at the most expensive queries.

In my trial account, these are the candidates for optimization. Since I am only working on small stuff in this trial (Iceberg :)), no big queries yet.

In the next article, we will take a first step in identifying root causes for expensive queries. But as a next step, I will also have a look at the NTT native app.

--

--

Johan

I am a freelance data engineer. My main focus is Snowflake, but I am always eager to learn new things. Python and AWS are my main side interests