Skip to main content

Setup Validation and Enrich

Stream Enrich is an application which:

  1. Reads raw Snowplow events off a stream populated by the Stream Collector
  2. Validates each raw event
  3. Enriches each event (e.g. infers the location of the user from his/her IP address)
  4. Writes the enriched Snowplow event to another stream

This guide covers how to setup enrich-kinesis.

Install, configure and run enrich-kinesis

The enrich-kinesis reference describes how to install, run, and configure the application.

Add any desired Enrichments

Snowplow offers a large number of enrichments that can be used to enhance your event data. An enrichment either updates or populates fields of the atomic event or adds a self-describing context to derived_contexts.

The order of enrichments is hard-coded and cannot be changed, below table lists available enrichments in order they are executed.

EnrichmentDescription
IABUse the IAB/ABC International Spiders and Bots List to determine whether an event was produced by a user or a robot/spider based on its' IP address and user agent.
User Agent utilsDeprecated - please consider switching to YAUAA.
UA parserParse the useragent and attach detailed useragent information to each event.
Currency conversionConvert the values of all transactions to a specified base currency using Open Exchange Rates. To use it, you need an Open Exchange Rates account.
Referer parserExtracts attribution data from referer URLs.
Campaign attributionChoose which querystring parameters will be used to generate the marketing campaign fields. If you do not enable the campaign_attribution enrichment, those fields will not be populated.
Event fingerprintGenerate a fingerprint for the event using a hash of client-set fields. Helpful for deduplicating events.
Cookie extractorSpecify cookies that you want to extract if found.
HTTP Header extractorSpecify headers that you want to extract via a regex pattern, if found each extracted header will be attached to your event.
Weather EnrichmentUnavailable since Enrich 1.4.x.
YAUAAParse and analyze the user agent string of an event and extract as many relevant attributes as possible using YAUAA API.
IP lookupsLookup useful data based on a user's IP address using the MaxMind database.
JavaScriptWrite a JavaScript function which is executed for each event.
SQL QueryPerform dimension widening on a Snowplow event via your own internal relational database.
API RequestPerform dimension widening on a Snowplow event via your own or third-party proprietary http(s) API.
IP anonymizationAnonymize the IP addresses found in the user_ipaddress field by replacing a certain number of octets with "x"s.
PII PseudonymizationBetter protect the privacy rights of data subjects by psuedoanonymizing collected data.

Each enrichment is enabled by configuring a JSON config file (one per enrichment), loading these into DynamoDB and then passing the location of the configs in DynamoDB to stream enrich on running it using the --enrichments argument as documented.

Sink the enriched data to S3 from Kinesis

Now that you have Stream Enrich running, you should have validated, enriched data being output into a Kinesis stream.

The next step is to setup the Snowplow S3 loader to sink this data to S3.

Instructions on how to load the data into other data stores e.g. Redshift, Snowflake and Elastic can be found under Destinations.

Was this page helpful?