Set up the stream collector

Available on Terraform Registry

A Terraform module is available which deploys the stream collector on a AWS EC2 without the need for this manual setup.

Run the collector

The stream collector is on docker hub with several different flavours. Pull the image that matches the sink you are using:

docker pull snowplow/scala-stream-collector-kinesis:2.9.0
docker pull snowplow/scala-stream-collector-pubsub:2.9.0
docker pull snowplow/scala-stream-collector-kafka:2.9.0
docker pull snowplow/scala-stream-collector-rabbitmq-experimental:2.9.0
docker pull snowplow/scala-stream-collector-nsq:2.9.0
docker pull snowplow/scala-stream-collector-sqs:2.9.0
docker pull snowplow/scala-stream-collector-stdout:2.9.0

The application is configured by passing a hocon file on the command line:

docker run --rm \
  -v $PWD/config.hocon:/snowplow/config.hocon \
  -p 8080:8080 \
  snowplow/scala-stream-collector-${flavour}:2.9.0 --config /snowplow/config.hocon

Alternatively, you can download and run a jar file from the github release.

java -jar scala-stream-collector-kinesis-2.9.0.jar --config /path/to/config.hocon

Telemetry notice

By default, Snowplow collects telemetry data for Collector (since version 2.4.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!).

This data is anonymous and minimal, and since our code is open source, you can inspect what’s collected.

If you wish to help us further, you can optionally provide your email (or just a UUID) in the collector.telemetry.userProvidedId configuration setting.

If you wish to disable telemetry, you can do so by setting collector.telemetry.disable to true.

See our telemetry principles for more information.

Health check

Pinging the collector on the /health path should return a 200 OK response:

curl http://localhost:8080/health

Available on Terraform Registry​

Run the collector​

Health check​

Available on Terraform Registry

Run the collector

Health check