By Kris Corrin (Principal BI Engineer) & Chris Austin
Here at synvert TCM (formerly Crimson Macaw) HQ we love a good feature, and Snowflake’s Time Travel feature is a contender for the best of them all. I know, that’s a bold claim, and the rest of this blog will be spent making a case for it. Hopefully, we’ll convince you by the end!
This article will be of particular interest if you have ever:
- Accidentally deleted data that can’t be easily restored
- Updated a table incorrectly and have no idea what it looked like before
- Missed a snapshot window for a dataset, and can’t recover the view you need
All the above scenarios fall firmly under the category of ‘not good’ as a custodian of data, and this article will give you a solution for each of them.
But first, a public service announcement…
Time Travel is automatically set to 1 day and is applied to all persistent databases, schemas and tables by default. It cannot be used with transient or temporary objects.
1 day is the maximum value for Snowflake Standard Edition. For Snowflake Enterprise Edition and above you can configure up to 90 days. This can be done when creating or altering an object using the data_retention_time_in_days
function.
For example:
create table mytable (col1number, col2 date) data_retention_time_in_days=30; alter table mytable data_retention_time_in_days=90;
If you’re feeling lucky or have reached the very pinnacle of efficiency, then you can turn Time Travel off by configuring the above values to 0. This is not a recommendation – even having the default 1-day setting can get you out of some sticky situations!
Finally, this is not a free feature! With each day added to the Time Travel period, you will incur additional storage costs, which you can read more about here.
It is worth mentioning, though, that the ability to configure Time Travel at the object level is useful for managing costs.
How does it work?
I was surprised to find out that it’s not magic, but it is very clever.
Time Travel works because of a layered storage system in Snowflake. They are three distinct layers. They are; Current Data Storage, Time Travel Retention and Fail-Safe.
Current Data Storage is where the data is right now. It is also where all your regular querying completes.
Time Travel Retention captures and holds a record of each change made to your database objects for the span of your configured data_retention_time_in_days
setting (remember the default is 1 day). Time Travel-specific querying is done against the data here.
For the purposes of this blog, we will ignore the Fail-Safe storage layer, as it is not accessible to users or relevant to Time Travel functionality.
How do I query my Time Travel data?
Where Time Travel really excels is its ease of use. Data retained in your Time Travel Retention storage can be easily queried with a select statement, qualified with ‘at’ or ‘before’. The qualifiers will allow you to select data based on a specific timestamp, time in seconds and a query ID (this can be retrieved quickly by filtering on the History tab in the Snowflake web interface).
Below we will run through some quick examples of each.
Querying by timestamp
To select data by timestamp, use the below syntax. Notice the ‘at’ after the initial selection. This takes our query to the Time Travel Retention storage.
select * from mytable at(timestamp => 'Fri, 14 Jan 2022 16:20:00 -0000'::timestamp_tz);
Querying by time in seconds
This one requires a bit of maths and the use of an offset. In the example below, we’re viewing data as it was 5 minutes (or 300 seconds) ago.
select * from mytable at(offset => -300);
Note the ‘-‘ before the 300, we’re travelling back in time not forwards! That would be a nice feature to add to the Snowflake backlog though…
Querying by query ID
This is a great one for erroneous table updates, once you have the query ID, you can view the data exactly as it was before the query executed – giving you the ability to compare the new and old data side by side.
As we want to know the state of our data before an event took place, we will use ‘before’ rather than ‘at’ as we’ve seen in the last 2 examples.
select * from my_table before(statement => '01a1a01f-0b02-8126-0001-6b3a037244e2');
Cloning data
Sometimes we need to take snapshots of data as they were at a point in time. For example, changing hierarchies or end-of-period financial reporting. If you miss the window for a snapshot, then you can find yourself in a whole world of sadness.
Not anymore!
In this example, our new period starts on Saturday 2nd April, so we want to create a snapshot of the data as it was immediately before. Remember, depending on your data retention setting, you can take this snapshot up to 90 days after the event has taken place!
create table restoredtable clone mytable at(timestamp => 'Fri, 01 Apr 2022 23:59:59 -0000'::timestamp_tz);
You can also use query ID and an offset by seconds to clone my data, it’s all about what works best for you in a given scenario. This is also useful when looking to clone entire schemas and databases. Just swap create table
for create
database
|
schema
and change your target to the relevant schema/database.
Undropping data
Have you ever got your environments mixed up and dropped the wrong table, schema or database? If so, you can now replace that long, soul-searching walk to the coffee machine followed by a carefully worded email to your boss with a simple command!
It is:
undrop mytable; undrop myschema; undrop mydatabase;
That’s it, that’s all you have to type. Again, depending on the retention periods you have up to 90 days to recover any lost database objects.
I hope you’ve found this blog useful. If you have any questions on the topics covered here or suggestions for topics you’d like us to cover in the future, please contact us.