Quick Start on GCP

This guide will take you through how to spin up an open source pipeline using the Snowplow terraform modules. Learn more about Infrastructure as code with Terraform here.

Before you begin

If you are interested in receiving the latest updates from Product & Engineering, such as critical bug fixes, security updates, new features and the rest, then join our mailing list.

You can find more details on the infrastructure and applications that will be deployed in your cloud here.

Would you rather run Snowplow locally?

Check out Snowplow Micro! Not what you are looking for? Let us know.

Prerequisites

Google Cloud SDK installed
The following APIs active in your GCP account (note: this list might not be exhaustive and is subject to change as GCP APIs evolve):
A Google cloud service account
- See details on using the service account with the Cloud SDK
- You will need to:
  - Navigate to your service account on Google Cloud Console
  - Create a new JSON Key and store locally
  - Create the environment variable by running export GOOGLE_APPLICATION_CREDENTIALS="KEY PATH" in terminal
Terraform 1.0.0 or higher installed
- Follow the instructions to make sure the terraform binary is available on your PATH. You can also use tfenv to help manage Terraform installation
Download the latest igluctl which allows you to publish schemas for your custom events and entities to Iglu (your schema registry)
Clone the repository at https://github.com/snowplow/quickstart-examples to your localhost
- git clone https://github.com/snowplow/quickstart-examples.git

Select which example you want to use

The Quickstart Examples repository contains two different deployment strategies:

default
secure (Recommended for production use cases)

The main difference is around the VPC that the components are deployed within. In default you will deploy everything into a public subnet, this is the easiest route if you want to try out Snowplow as you can use your default network (auto mode VPC). However, to increase the security of your components, it is recommended and best practise to deploy components into private subnets. This ensures they are not available publicly. To use the `secure` configuration you will need your own custom VPC network with public and private subnets. You can follow this guide for steps on how to create networks and subnetworks on GCP.

Setting up your Iglu Server

The first step is to set up your Iglu Server stack. This will mean that you can create and evolve your own custom event & entities. Iglu enables you to store the schemas for your events & entities and fetch them as your events are getting processed by your pipeline.

We will go into more details on why this is very valuable and how to create your custom events & entities later, but for now you will need to set this up first so that your pipeline (specifically the Enrich application and your loader) can communicate with Iglu.

Step 1: Update your input variables

Once you have cloned the quickstart-examples repository, you will need to navigate to the /gcp/iglu_server directory to update the input variables in terraform.tfvars.

git clone https://github.com/snowplow/quickstart-examples.git
cd quickstart-examples/terraform/gcp/iglu_server/default #or secure
nano terraform.tfvars #or other text editor of your choosing

To update your input variables, you'll need to know a couple of things:

Your IP Address. Help.
A UUID for your Iglu Servers API Key. Help.
If you have opted for secure, the network and subnetworks you will deploy your Iglu Server into.
- If you are deploying to your default network then set network = default and leave subnetworks empty
How to generate a SSH Key.
- On most systems you can generate a SSH Key with: ssh-keygen -t rsa -b 4096
- This will output where you public key is stored, for example: ~/.ssh/id_rsa.pub
- You can get the value with cat ~/.ssh/id_rsa.pub

Telemetry notice

By default, Snowplow collects telemetry data for each of the Quick Start Terraform modules. Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!).

This data is anonymous and minimal, and since our code is open source, you can inspect what’s collected.

If you wish to help us further, you can optionally provide your email (or just a UUID) in the user_provided_id variable.

If you wish to disable telemetry, you can do so by setting telemetry_enabled to false.

See our telemetry principles for more information.

Step 2: Run the terraform script to set up your Iglu stack

You can now use terraform to create your Iglu Server stack.

terraform init
terraform plan
terraform apply

This will output your iglu_server_dns_name. Make a note of this, you'll need it when setting up your pipeline. If you have attached a custom ssl certificate and set up your own DNS records then you don't need this value.

Step 3: Seed your Iglu Server from Iglu Central

For your pipeline to work, you'll need to seed your Iglu Server with the standard Snowplow Schemas that are hosted in Iglu Central. To do this you will need igluctl, your Iglu Servers DNS and your Iglu API key that you created for your terraform.tfvars. You should update the igluctl command below with the correct values for your Iglu Server.

git clone https://github.com/snowplow/iglu-central
cd iglu-central
igluctl static push --public schemas/ http://CHANGE-TO-MY-IGLU-IP 00000000-0000-0000-0000-000000000000

Setting up your pipeline

In this section you will update the input variables for the terraform module, and then run the terraform script to set up your pipeline. At the end you will have a working Snowplow pipeline that you can send your web, mobile or server side data to.

Step 1: Update your input variables

Once you have cloned the quickstart-examples repository, you will need to navigate to the pipeline directory to update the input variables in either postgres.terraform.tfvars or bigquery.terraform.tfvars according to the chosen destination. How to choose the destination and configure it will be explained in detail in the next section.

git clone https://github.com/snowplow/quickstart-examples.git
cd quickstart-examples/terraform/gcp/pipeline/default #or secure
nano <destination>.terraform.tfvars #or other text editor of your choosing

To update your input variables, you'll need to know a couple of things:

Your IP Address. Help.
Your Iglu Servers DNS from Setting up your Iglu Server.
Your UUID for your Iglu Servers API Key. Help.
If you have opted for secure, the network and subnetworks you will deploy your Iglu Server into.
- If you are deploying to your default network then set network = default and leave subnetworks empty.
How to generate a SSH Key.
- On most systems you can generate a SSH Key with: ssh-keygen -t rsa -b 4096
- This will output where you public key is stored, for example: ~/.ssh/id_rsa.pub
- You can get the value with cat ~/.ssh/id_rsa.pub

As mentioned above, there are two options for pipeline's destination database. These are Postgres and BigQuery. Your chosen database needs to be specified with the postgres_db_enabled or bigquery_db_enabled variables. Respective <destination>.terraform.tfvars file should be filled in according to the chosen database. Only database specific variables are different in those two tfvars files.

Postgres

If you choose Postgres as destination, there is no additional step. Respective variables need to be filled according to the desired setup. Necessary resources like Postgres instance, database, table, user will be created by Pipeline Terraform module.

BigQuery

If you choose BigQuery as destination, there is no additional step nor additional variables that need to be setup.

Step 2: Run the terraform script to set up your Pipeline stack

You can now use terraform to create your Pipeline stack.

terraform init
terraform plan -var-file=<destination>.terraform.tfvars
terraform apply -var-file=<destination>.terraform.tfvars

This will output your collector_dns_name, db_address, db_port, bigquery_db_dataset_id, bq_loader_dead_letter_bucket_name and bq_loader_bad_rows_topic_name. Depending on your destination some of these outputs will be empty. Make a note of these, you’ll need it when sending events and connecting to your database. If you have attached a custom ssl certificate and set up your own DNS records then you don’t need your collector_dns_name as you will use your own DNS record to send events from the Snowplow trackers.

Now let's send some events to your pipeline!

Before you begin​

Prerequisites​

Select which example you want to use​

Setting up your Iglu Server​

Setting up your pipeline​

Postgres​

BigQuery​