Migrating to Snowflake? Here Is What You Need To Know
You’ve tried it out, and decided to make the move to Snowflake. But now what? Where should you start?
When in 2020 Snowflake burst into the stock market with a boom, I started noticing a growing interest on the job market for professionals with Snowflake experience to help with migrations. Snowflake was no longer the “dark horse”, and companies started to embrace it as a natural alternative to other long well established technologies, like Azure DW or Amazon Redshift.
So, maybe you’re one of those companies who finally decided to take the leap and move to Snowflake, and now you’re looking at this “almost the same, but not really the same” technology and you’re not really figuring out where to start.
This article is for you! Here is what you need to know and what you need to watch out for when starting your journey into Snowflake.
Prepare For The “Pay-Per-Use” Pricing Model
Data professionals are not used to think about their queries/usage in a “pay-per-use” way. We have always seen databases as “always available” resources, and that means we don’t have a link in our heads between usage and cost. However, with pay-per-use models like the one Snowflake has there is a direct link between usage and cost: you pay for the time you use a node, and cost scales linearly with cluster size. But don’t think for a second this is not a good thing, this is one of the big selling points of Snowflake for me: you can instantly match your warehouse size to your work requirements.
For all these reasons, it’s important that at least your heavier users understand how their usage impact costs, and to keep an eye on it for the first couple months.
To read more about this pay-per-use model, its consequences and how to adapt to it, go to Using Snowflake? Don’t Make These Expensive Mistakes.
Say Goodbye To Database Maintenance
It may feel weird at first, but you’ll get used to it: Snowflake just handles all the tuning and database maintenance for you. If you’re migrating from AWS Redshift, this means you no longer need to select a restrictive distribution key, nor sorting keys, and no more vacuuming. Also, you don’t need to worry about encoding/compression, it’s all handled by Snowflake.
Having nothing to tune can feel quite an odd for someone coming from an AWS Redshift cluster, but don’t be tempted, it’s very unlikely that you’ll do it better than Snowflake.
And just in case you read about clustered tables in Snowflake, be sure to read the large disclaimer which says “Clustering keys are not intended for all tables”. I would actually say it’s very unlikely you’ll need to use this at all, as it only serves very limited use cases, so be sure to have a clear before vs after comparison before you decide to use it.
Moving The Data
Just like other MPP platforms, Snowflake excels at loading data from distributed blob storage, so that should be fairly straightforward. However, if you have a very large amount of data, you may want to consider an alternative to the COPY statement: loading data using external tables is actually much quicker than a COPY, so maybe it’s something to keep in mind.
Syntax Conversion (from Redshift)
Most of the SQL syntax is fully compatible, so you shouldn’t need to change too much of your old scripts, but you may find some differences on the DDL though. In this case, clearly “google is your friend”, and don’t forget to check on Snowflake’s own forum: https://snowflakecommunity.force.com/
Get Familiar With The Good Stuff!
You’d be surprised how many analysts and engineers only find about the good stuff after 3 or 4 months of using Snowflake! Don’t be that guy :) Here is a good summary of where to start:
Understand the Snowflake Access Control Model
Coming from other databases, the Snowflake security model, more specifically roles, can be a bit confusing… but once you get them, you’ll see it’s quite handy! Anyway, and should you use roles or not, understand the security model early on! It’s really easy to start firing grants here and there, but the longer you go without a plan the harder it will be to move into a different model. Some important things to know about roles:
- if you “grant role child_role to role parent_role” , the parent role will inherit all of the child’s permissions;
- users can be granted multiple roles, but can only use one of them at any point in time;
- granting permission on “all objects” (ie. all existing object) is a separate permission to “all future” (ie. any objects created in the future);
You can find all about it in Snowflake’s Access Control Documentation.
Understand How Warehouses Work, So You Can Use Them Optimally
Ephemeral (compute) warehouses are probably something new for you, and there are a few common doubts/questions that new Snowflake users have about how they work. Here is a good summary of “What if … , what happens then?” answers.
* What if I suspend a warehouse while someone else is using it?
Suspending a warehouse does not automatically suspend it, and any queries which are already ongoing will be fully fulfilled, and same will happen with any newly incoming queries. See documentation here.
* What if I resize a warehouse while queries are in-flight?
Queries which are already ongoing, will always finish running in the warehouse size they started running in. Queries that are started after a warehouse resize, will be run in the new warehouse size. This means that you may end up having more than one warehouse running at the same time fulfilling different queries at different warehouse sizes. See documentation here.
* What if I have a query running in a small warehouse, can I resize the warehouse to make it quicker?
No, you can’t. The only way would be to stop the query, resize the warehouse, and then run it again. Changing a warehouse has no effect on queries which are already running.
* Why wouldn’t I change warehouse size or suspend warehouses all the time? What are the downsides?
There are 2 main downsides of changing warehouse sizes/suspending frequently:
1 — Each time you suspend/change size, you lose the cache, which may have performance implications (see blog post here)
2 — You always get charged for the first 1 minute of any warehouse (also applicable to the 1 minute after resizing). This means that starting a warehouse in “x-small” and change to “large” afterwards, incurs a minimum charge of 1 minute for both warehouses.
* Do Snowflake Warehouses always scale linearly (ie. double the warehouse size takes half the time to run a query)?
Most of the times, yes, but not always. In cases where the main performance driver is not compute (like in a COPY statement), you may not get linear scaling.
Get Used to Monitor Usage
The best time to establish good standards is… early! As I covered on this article, the pay-per-use model can be either a friend or foe, depending on how you use it, so keep an eye on your usage and tune your warehouse setup in a way that optimises “run-time” and reduces idle-time.
To help you with that, Snowflake has a shared schema Account Usage Shared Tables which you can use to track all sorts of usage, query execution, etc.
Set Up Integrated Authentication If You Can
As data professionals, we know that we hold the keys to the kingdom in so many ways: we are sitting on top of a lot of data. For that reason, we can’t allow space for error. As I covered in this other article, integrated authentication works amazingly well in Snowflake, so why make your users use unsafe passwords or having to put up with annoying MFA setups.
If you can set up integrated authentication, don’t think twice.
Don’t Overlook These Snowflake Configurations
Snowflake is such a different database breed it requires some adapting to, and it takes time to understand all the little differences. However, some of these parameters are best chosen as early as possible, because once you are in production you will be less inclined to change this type of platform-wide configurations. For more on this topic, feel free to check out this other article of mine.
A Final Word
Snowflake is an amazing technology with a lot of nice features worth looking into. Some of these features will save you time, others will save you headaches, but if you’re not aware of them they won’t be more than a missed opportunity. This is not an exhaustive list of everything you should know about Snowflake, but I dare saying if you get these right, then you’ve covered a lot of ground already!
The Snowflake Series
This article is part of my (ever growing) Snowflake Series. If you are curious to learn more about Snowflake, these articles may also be of interest to you:
- What makes Snowflake so much better than others
- Snowflake and Pay-Per-Use: Don’t make these expensive mistakes
- Snowflake vs Redshift RA3 — The need for (more than just) speed
- Near-realtime Data Ingestion in Snowflake (and how to implement a hot/cold/frozen data storage strategy)
- Performance Considerations when using External Tables
- Snowflake: Single Sign On Done The Right Way