Set up the stream collector
Available on Terraform Registry
A Terraform module is available which deploys the stream collector on a AWS EC2 without the need for this manual setup.
Run the collector
The stream collector is on docker hub with several different flavours. Pull the image that matches the sink you are using:
docker pull snowplow/scala-stream-collector-kinesis:2.9.0
docker pull snowplow/scala-stream-collector-pubsub:2.9.0
docker pull snowplow/scala-stream-collector-kafka:2.9.0
docker pull snowplow/scala-stream-collector-rabbitmq-experimental:2.9.0
docker pull snowplow/scala-stream-collector-nsq:2.9.0
docker pull snowplow/scala-stream-collector-sqs:2.9.0
docker pull snowplow/scala-stream-collector-stdout:2.9.0
The application is configured by passing a hocon file on the command line:
docker run --rm \
-v $PWD/config.hocon:/snowplow/config.hocon \
-p 8080:8080 \
snowplow/scala-stream-collector-${flavour}:2.9.0 --config /snowplow/config.hocon
Alternatively, you can download and run a jar file from the github release.
java -jar scala-stream-collector-kinesis-2.9.0.jar --config /path/to/config.hocon
Telemetry notice
By default, Snowplow collects telemetry data for Collector (since version 2.4.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!).
This data is anonymous and minimal, and since our code is open source, you can inspect what’s collected.
If you wish to help us further, you can optionally provide your email (or just a UUID) in the collector.telemetry.userProvidedId
configuration setting.
If you wish to disable telemetry, you can do so by setting collector.telemetry.disable
to true
.
See our telemetry principles for more information.
Health check
Pinging the collector on the /health path should return a 200 OK response:
curl http://localhost:8080/health