Skip to main content

What is deployed?

Let’s take a look at what is deployed on AWS upon running the quick start example script.

Note: you can very easily edit the script or run each of the terraform modules independently, giving you the flexibility to design the topology of your pipeline according to your needs.

Collector load balancer

This is an application load balancer (ELB) for your inbound HTTP/S traffic. Traffic is routed from the load balancer to the collector. 

For further details on the resources, default and required input variables, and outputs see the terraform-aws-alb module github repository.

Stream Collector

This is a Snowplow event collector that receives raw Snowplow events over HTTP, serializes them to a Thrift record format, and then writes them to Kinesis. More details can be found here.

Find out more about the Collector terraform module, and explore the full set of variables here: https://registry.terraform.io/modules/snowplow-devops/collector-kinesis-ec2/aws/latest.

Stream Enrich

This is a Snowplow app written in scala which: 

  • Reads raw Snowplow events off a Kinesis stream populated by the Scala Stream Collector
  • Validates each raw event
  • Enriches each event (e.g. infers the location of the user from his/her IP address)
  • Writes the enriched Snowplow event to another stream

It is designed to be used downstream of the Scala Stream Collector. More details can be found here

Find out more about the Enrich modules and explore the full set of variables available here: https://registry.terraform.io/modules/snowplow-devops/enrich-kinesis-ec2/aws/latest.

Kinesis streams

Your kinesis streams are a key component of ensuring a non-lossy pipeline, providing crucial back-up, as well as serving as a mechanism to drive real time use cases from the enriched stream. 

Find out more about the Kinesis stream module and explore the full set of variables available here: https://registry.terraform.io/modules/snowplow-devops/enrich-kinesis-ec2/aws/latest.

Raw stream

Collector payloads are written to this raw kinesis stream, before being picked up by the Enrich application. The S3 loader (raw) also reads from this raw stream and writes to the raw S3 folder.

Enriched stream

Events that have been validated and enriched by the Enrich application are written to this enriched stream. The S3 loader (enriched) reads from this enriched stream and writes to the enriched folder on S3.

Bad 1 stream

This bad stream is for events that the collector, enrich or S3 loader (raw and enriched) applications fail to process. An event can fail at the collector point due to, for instance, it being too large for the stream creating a size violation bad row, or it can fail during enrichment due to a schema violation or enrichment failure.  More details can be found here

Bad 2 stream

This bad stream is for failed events generated by the S3 loader as it tries to write from the bad 1 stream to the bad folder on S3.

Iglu

Iglu allows you to publish, test and serve schemas via an easy-to-use RESTful interface. It is split into a few services.

Iglu load balancer

This load balances the inbound traffic and routes traffic to the Iglu Server. 

Find out more about the application load balancer module and explore the full set of variables available here: https://registry.terraform.io/modules/snowplow-devops/alb/aws/latest.

Iglu Server

The Iglu Server serves requests for Iglu schemas stored in your schema registry. 

Find out more about the Iglu Server module and explore the full set of variables available here: https://registry.terraform.io/modules/snowplow-devops/iglu-server-ec2/aws/latest.

Iglu RDS

This is the Iglu Server database where the Iglu schemas themselves are stored. 

Find out more about the RDS module and explore the full set of variables available here:_ https://registry.terraform.io/modules/snowplow-devops/rds/aws/latest.

S3 loader

The Snowplow S3 Loaders consume records from your relevant Amazon Kinesis streams (as outlined above) and writes them to S3

Find out more about the S3 loader module and explore the full set of variables available here: https://registry.terraform.io/modules/snowplow-devops/s3-loader-kinesis-ec2/aws/latest.

S3 loader raw

Responsible for reading from the raw stream (i.e. events from the collector that have not yet been validated or enriched) and writing to the raw folder on S3. Any events that have failed to be processed by the raw S3 loader get written to your bad-1 stream.

S3 loader bad

Responsible for reading from the bad-1 stream and writing to the bad folder on S3. Any events that fail to be processed by the bad S3 loader get written to the bad-2 stream.

S3 loader enriched

Responsible for reading from the enriched stream and writing to your enriched folder on S3. Any events that fail to be processed by the enriched S3 loader get written to the bad-1 stream.

S3 loader bucket

Your S3 bucket where the raw, enriched and bad data gets written to by the S3 loader.

Find out more about the S3 bucket module and explore the full set of variables available here: https://registry.terraform.io/modules/snowplow-devops/s3-bucket/aws/latest.

Postgres loader

The Snowplow application responsible for reading the enriched and bad data and loading to Postgres.

Find out more about the Postgres loader module and explore the full set of variables available here: https://registry.terraform.io/modules/snowplow-devops/postgres-loader-kinesis-ec2/aws/latest.

DynamoDB

On the first run of each of the applications consuming from Kinesis (e.g. Enrich), the Kinesis Connectors Library creates a DynamoDB table to keep track of what they have consumed from the stream so far. Each Kinesis consumer maintains its own checkpoint information.

The DynamoDB autoscaling module enables autoscaling for a target DynamoDB table. Note that there is a kcl_write_max_capacity variable which can be set to your expected RPS, but setting it high will of course incur more cost.

You can find further details here: https://registry.terraform.io/modules/snowplow-devops/dynamodb-autoscaling/aws/latest.

Was this page helpful?