Skip to main content

What is deployed?

Let’s take a look at what's deployed on GCP upon running the quick start example script.

Note: you can very easily edit the script by removing certain modules, giving you the flexibility to design the topology of your pipeline according to your needs.

Collector load balancer

This is an application load balancer for your inbound HTTP/S traffic. Traffic is routed from the load balancer to the collector.

For further details on the resources, default and required input variables, and outputs see the terraform-google-lb terraform module.

Stream Collector

This is a Snowplow event collector that receives raw Snowplow events over HTTP, serializes them to a Thrift record format, and then writes them to pubsub. More details can be found here.

For further details on the resources, default and required input variables, and outputs see the collector-pubsub-ce terraform module.

Stream Enrich

This is a Snowplow app written in scala which:

  • Reads raw Snowplow events off a Pubsub topic populated by the Scala Stream Collector
  • Validates each raw event
  • Enriches each event (e.g. infers the location of the user from his/her IP address)
  • Writes the enriched Snowplow event to the enriched topic

It is designed to be used downstream of the Scala Stream Collector. More details can be found here.

For further details on the resources, default and required input variables, and outputs see the enrich-pubsub-ce terraform module.

Pubsub topics

Your pubsub topics are a key component of ensuring a non-lossy pipeline, providing crucial back-up, as well as serving as a mechanism to drive real time use cases from the enriched stream.

For further details on the resources, default and required input variables, and outputs see the pubsub-topic terraform module.

Raw stream

Collector payloads are written to this raw pubsub topic, before being picked up by the Enrich application.

Enriched topic

Events that have been validated and enriched by the Enrich application are written to this enriched stream.

Bad 1 topic

This bad topic is for events that the collector or enrich fail to process. An event can fail at the collector point due to, for instance, it being too large for the stream creating a size violation bad row, or it can fail during enrichment due to a schema violation or enrichment failure. More details can be found here.

No other pubsub topics.

Iglu

Iglu allows you to publish, test and serve schemas via an easy-to-use RESTful interface. It is split into a few services.

Iglu load balancer

This load balances the inbound traffic and routes traffic to the Iglu Server.

For further details on the resources, default and required input variables, and outputs see the google-lb terraform module.

Iglu Server

The Iglu Server serves requests for Iglu schemas stored in your schema registry. 

For further details on the resources, default and required input variables, and outputs see the iglu-server-ce terraform module.

Iglu CloudSQL

This is the Iglu Server database where the Iglu schemas themselves are stored.

For further details on the resources, default and required input variables, and outputs see the cloud-sql terraform module.

Postgres loader

The Snowplow application responsible for reading the enriched and bad data and loading to Postgres.

For further details on the resources, default and required input variables, and outputs see the postgres-loader-pubsub-ce terraform module.

Next, start tracking events from your own application.

Was this page helpful?