This is the first part of a new series of articles I will be posting here: how to optimize Snowflake for cost and performance.
Let’s first start with two strategies that you can follow here:
- use third party services to help you optimize Snowflake. Options here are Snowflake native apps or ‘external’ tools.
- build your own expertise using some of Snowflake’s out of the box options
The alternatives for the ‘third party’ strategy can be those:
- Use third party ‘external tools’: Select, Espresso
- Use native apps from the Snowflake Marketplace: NTT, Select, Keebo, Kibi, Ava
I will work on reviews of some of those third party options in the near future. Please leave a comment when I missed some tools.
Let’s now dive into the other strategy, a DIY approach to optimizing Snowflake. Following this strategy means that you will use tools that are available out of the Snowflake-box, so no external tooling. This also means that you need to invest in building up knowledge about this, so you need to understand:
- how to identify expensive queries
- how to do a root cause analysis
- how to implement a fix
These steps have to be repeated over and over for each bottleneck you identify.
The first step to set is to identify which queries are the most expensive ones. Snowflake determines this by taking the average execution time and the number of runs into account. The queries are grouped using the query hash.
We can use some very interesting out-of-the box features from Snowflake to identify those expensive queries:
Go to Admin/Cost Management/Account Overview in Snowsight, you will see an overview of the “most expensive queries”: average query execution time and the number of runs within the timeframe:
You can choose all those date ranges but typically when you are in the optimization phase, only the last couple of days will be relevant, assuming that the heavy queries you found in the past, are now optimized and therefore have gone from the “most expensive queries” overview.
But, what if you try the above and get this message:
Let’s assume that ACCOUNTADMIN is not available, we need to request some grants:
grant database role snowflake.USAGE_VIEWER to role public;
grant database role snowflake.GOVERNANCE_VIEWER to role public;
With these grants, you can have a look at the most expensive queries.
In my trial account, these are the candidates for optimization. Since I am only working on small stuff in this trial (Iceberg :)), no big queries yet.
In the next article, we will take a first step in identifying root causes for expensive queries. But as a next step, I will also have a look at the NTT native app.