Snowflake loader
It is possible to run Snowflake Loader on both AWS and GCP.
Running on AWS
There are two ways to set up the necessary Snowflake resources and run the loader on AWS:
- using our dedicated open source Terraform modules to create the resources and deploy the loader on EC2
- creating the resources and running the application manually.
We recommend the first way.
Running on GCP
At the moment, terraform modules for deploying Snowflake Loader on GCP aren't implemented. Therefore necessary Snowflake resources needs to be created and application needs to be deployed manually.
Using the Terraform modules (for AWS)
Requirements
- Terraform >= 1.0.0
terraform-snowflake-target
moduleterraform-aws-snowflake-loader-setup
moduleterraform-aws-snowflake-loader-ec2
module
Usage
The terraform-snowflake-target
and terraform-aws-snowflake-loader-setup
modules create the necessary Snowflake resources to run the loader. The outputs of these modules become inputs to the terraform-aws-snowflake-loader-ec2
module.
Stitching these modules together is described here.
We also have full pipeline deployment examples here, including a deployment example for a pipeline with Snowflake as destination. This lets you see how all the Terraform modules are used in a full pipeline deployment.
Manual setup and deployment
Setting up Snowflake
The following resources need to be created:
- Snowflake loader user
- Snowflake loader role
- Snowflake warehouse
- Snowflake database
- Snowflake schema
events
table in the same schema (see here for the schema)- Snowflake storage integration
- Snowflake file format
- Snowflake stage to load transformed events.
Creating Snowflake stage for transformed events
The Snowflake stage is the most complicated one to create from the resources listed above.
To create a Snowflake stage, you need a Snowflake database, Snowflake schema, Snowflake storage integration, Snowflake file format, and the blob storage (S3 or GCS) path to the transformed events bucket.
You can follow this tutorial to create the storage integration.
Assuming you created the other required resources for it, you can create the Snowflake stage by following this document.
Downloading the artifact
The asset is published as a jar file attached to the Github release notes for each version.
It's also available as a Docker image on Docker Hub under snowplow/rdb-loader-snowflake:5.4.1
.
Configuring rdb-loader-snowflake
The loader takes two configuration files:
- a
config.hocon
file with application settings - an
iglu_resolver.json
file with the resolver configuration for your Iglu schema registry.
Minimal Configuration | Extended Configuration |
---|---|
aws/snowflake.config.minimal.hocon | aws/snowflake.config.reference.hocon |
gcp/snowflake.config.minimal.hocon | gcp/snowflake.config.reference.hocon |
For details about each setting, see the configuration reference.
See here for details on how to prepare the Iglu resolver file.
All self-describing schemas for events processed by RDB Loader must be hosted on Iglu Server 0.6.0 or above. Iglu Central is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. For details on this see here.
Running the Snowflake loader
The two config files need to be passed in as base64-encoded strings:
$ docker run snowplow/rdb-loader-snowflake:5.4.1 \
--iglu-config $RESOLVER_BASE64 \
--config $CONFIG_BASE64
Telemetry notice
By default, Snowplow collects telemetry data for Snowflake Loader (since version 5.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!).
This data is anonymous and minimal, and since our code is open source, you can inspect what’s collected.
If you wish to help us further, you can optionally provide your email (or just a UUID) in the telemetry.userProvidedId
configuration setting.
If you wish to disable telemetry, you can do so by setting telemetry.disable
to true
.
See our telemetry principles for more information.