What is deployed?

Let’s take a look at what's deployed on GCP upon running the quick start example script.

Note: you can very easily edit the script by removing certain modules, giving you the flexibility to design the topology of your pipeline according to your needs.

Postgres
BigQuery

Collector load balancer

This is an application load balancer for your inbound HTTP/S traffic. Traffic is routed from the load balancer to the collector.

For further details on the resources, default and required input variables, and outputs see the terraform-google-lb terraform module.

Stream Collector

This is a Snowplow event collector that receives raw Snowplow events over HTTP, serializes them to a Thrift record format, and then writes them to pubsub. More details can be found here.

For further details on the resources, default and required input variables, and outputs see the collector-pubsub-ce terraform module.

Stream Enrich

This is a Snowplow app written in scala which:

Reads raw Snowplow events off a Pubsub topic populated by the Scala Stream Collector
Validates each raw event
Enriches each event (e.g. infers the location of the user from his/her IP address)
Writes the enriched Snowplow event to the enriched topic

It is designed to be used downstream of the Scala Stream Collector. More details can be found here.

For further details on the resources, default and required input variables, and outputs see the enrich-pubsub-ce terraform module.

Pubsub topics

Your pubsub topics are a key component of ensuring a non-lossy pipeline, providing crucial back-up, as well as serving as a mechanism to drive real time use cases from the enriched stream.

For further details on the resources, default and required input variables, and outputs see the pubsub-topic terraform module.

Raw stream

Collector payloads are written to this raw pubsub topic, before being picked up by the Enrich application.

Enriched topic

Events that have been validated and enriched by the Enrich application are written to this enriched stream.

Bad 1 topic

This bad topic is for events that the collector or enrich fail to process. An event can fail at the collector point due to, for instance, it being too large for the stream creating a size violation bad row, or it can fail during enrichment due to a schema violation or enrichment failure. More details can be found here.

Postgres
BigQuery

No other pubsub topics.

Bad Rows topic

This bad topic contains events that could not be inserted into BigQuery by the loader. This includes data is not valid against its schema or is somehow corrupted in a way that the loader cannot handle.

BigQuery datasets

There will be one new dataset available with the suffix _snowplow_db. Within which there will be a table called events - all of your collected events will be available here generally within a few seconds after they are sent into the pipeline.

GCS buckets

For any data that cannot be loaded at all we have deployed a dead letter bucket which will have the suffix -bq-loader-dead-letter. All events that are re-tried by the repeater and fail to be inserted into BigQuery end up here.

Iglu

Iglu allows you to publish, test and serve schemas via an easy-to-use RESTful interface. It is split into a few services.

Iglu load balancer

This load balances the inbound traffic and routes traffic to the Iglu Server.

For further details on the resources, default and required input variables, and outputs see the google-lb terraform module.

Iglu Server

The Iglu Server serves requests for Iglu schemas stored in your schema registry.

For further details on the resources, default and required input variables, and outputs see the iglu-server-ce terraform module.

Iglu CloudSQL

This is the Iglu Server database where the Iglu schemas themselves are stored.

For further details on the resources, default and required input variables, and outputs see the cloud-sql terraform module.

Postgres
BigQuery

Postgres loader

The Snowplow application responsible for reading the enriched and bad data and loading to Postgres.

For further details on the resources, default and required input variables, and outputs see the postgres-loader-pubsub-ce terraform module.

Next, start tracking events from your own application.

Collector load balancer​

Stream Collector​

Stream Enrich​

Pubsub topics​

Raw stream​

Enriched topic​

Bad 1 topic​

Bad Rows topic​

BigQuery datasets​

GCS buckets​

Iglu​

Iglu load balancer​

Iglu Server​

Iglu CloudSQL​

Postgres loader​

BigQuery loader​