Writing your enrichment
Your JavaScript enrichment code should contain a function called process
:
- This function will receive each event as its only argument
- It can optionally return an array of entities to be added to the event (under
derived_contexts
) - Any uncaught exceptions will result in failed events
function process(event) {
// do something with the event...
...
// add entities to the event
return [ ... ];
}
Remember that you can only have one JavaScript enrichment, and hence a single process
function for your pipeline. However, you can split more complex logic into multiple helper functions and variables as you see fit, as long as you comply with the above interface.
JavaScript enrichment uses the Nashorn Engine and since version 3.0.0 of Enrich, many features of ECMAScript 6 are supported. For a list of those features, please refer to this OpenJDK proposal. Regarding the features the proposal says “might be feasible” in the future, as of 2023 our testing shows that classes and generators don't work, but tail calls do.
Best practices
Before we dive into it, here are a few general tips:
- Make sure your code works for all of your events, not just the particular types of events you are interested in. Remember, unhandled exceptions will result in failed events.
- Don’t try to share state across multiple enriched events. Enrichments are run inside a highly parallel application with multiple independent instances, so this will not work.
- In your enrichment code, avoid CPU-intensive tasks (e.g. encryption) and IO-intensive tasks (e.g. requests to an external service) without thoroughly benchmarking the impact they might have on your event processing time.
- The enrichment code has access to the Java standard library and therefore to the filesystem of the machine it’s running on. Proceed with caution when copying code from untrusted sources.
Inspecting the event fields
Regardless of what you want to do with the event, you will likely want to inspect some of the data within. For instance, to get the app_id
field:
function process(event) {
const appId = event.getApp_id();
...
There are getter methods available for each of the standard event fields — just capitalize the first letter of the field and prepend it with get
, for example event.getUser_ipaddress()
or event.getGeo_country()
.
One exception is refr_device_tstamp
, where the getter method is called getRefr_dvce_tstamp
and not getRefr_device_tstamp
.
Inspecting self-describing events and entities
If your event is a self-describing event, you might want to access your custom fields. Here’s how to do it:
function process(event) {
...
const myEvent = JSON.parse(event.getUnstruct_event());
if (myEvent) {
// the schema of your self-describing event
const mySchema = myEvent.data.schema;
// the custom data in your self-describing event
const myData = myEvent.data.data;
...
For events other than self-describing events, getUnstruct_event()
will return null
. The pattern above works, as null
is a valid input for JSON.parse
.
You can access any entities in the event in a similar fashion:
function process(event) {
...
const entities = JSON.parse(event.getContexts());
if (entities) {
// loop through the entities
for (const entity of entities.data) {
if (entity.schema.startsWith('iglu:org.my-company/my-schema/jsonschema/1')) {
// work with the entity
const myField = entity.data.myField;
...
}
}
}
...
For events with no entities attached, getContexts()
will return null
. The pattern above works, as null
is a valid input for JSON.parse
.
Currently, derived entities (added by other enrichments) cannot be fetched. event.getDerived_contexts()
will always return null
.
Adding extra entities to the event
Adding entities is the preferred way of augmenting your events with extra information, because it preserves the original event fields intact.
In some cases, you might choose to update existing fields instead of adding entities. However, keep in mind that if you overwrite a field, you won’t have access to its original value in your data warehouse or lake.
The (optional) return value of the process
function is an array of extra entities to add to the event. So adding entities is as simple as returning them!
function process(event) {
...
return [
{
schema: 'iglu:com.my-company/traffic-source/jsonschema/1-0-0',
data: {
traffic_source: 'internal'
}
}
];
}
The entities you add with this method will be derived entities, similar to what other enrichments add. You will find them in the derived_contexts
field of the event.
Behavior for special values (e.g. NaN
)
Your array of entities will be passed to JSON.stringify()
before being attached to the event. This is irrelevant for you, unless your entities have NaN
values (they will become null
), or undefined
values (they will be dropped), or circular references (an exception will be thrown).
If you are still iterating on the schema while writing the JavaScript code, you might find the setup described in the testing guide very useful.
Make sure that the schemas of your entities are defined and accessible to your pipeline.
Modifying event fields directly
Sometimes you will want to modify the original event fields directly.
Keep in mind that the old value of a modified field will not be available in your data warehouse or lake. However, that might be your goal.
Just like with getters, there are setter methods available for each of the standard event fields:
function process(event) {
event.setMkt_source('Facegoog');
event.setGeo_latitude(null);
event.setGeo_longitude(null);
...
One exception is refr_device_tstamp
, where the setter method is called setRefr_dvce_tstamp
and not setRefr_device_tstamp
.
Modifying self-describing events and entities
If you want to modify the self-describing event fields or the entities attached to the event, you will need to reverse the steps you took to fetch them.
For self-describing events:
function process(event) {
...
// unpack the self-describing event
const myEvent = JSON.parse(event.getUnstruct_event());
if (myEvent && myEvent.data.schema === ...) {
// update a field inside
myEvent.data.data.myField = 'new value';
// pack the self-describing event back
event.setUnstruct_event(JSON.stringify(myEvent));
}
...
For entities:
function process(event) {
...
// unpack the entities
const entities = JSON.parse(event.getContexts());
if (entities) {
// loop through the entities
for (const entity of entities.data) {
if (entity.schema === ...) {
// update a field inside
entity.data.myField = entity.data.myField + 1;
}
}
// pack the entities back
event.setContexts(JSON.stringify(entities));
}
...
Discarding the event
Sometimes you don’t want the event to appear in your data warehouse or lake, e.g. because you suspect it comes from a bot and not a real user. In this case, you can throw
an exception in your JavaScript code, which will send the event to failed events:
const botPattern = /.*Googlebot.*/;
function process(event) {
const useragent = event.getUseragent();
if (useragent !== null && botPattern.test(useragent)) {
throw "Filtered event produced by Googlebot";
}
}
This will create an “enrichment failure” failed event, which may be tricky to distinguish from genuine failures in your enrichment code, e.g. due to a mistake. In the future, we might provide a better mechanism for discarding events.
Accessing Java methods
Because the JavaScript enrichment runs inside the Enrich application, it has access to the Java standard library, as well as some Java libraries (the ones used by Enrich). You can call Java methods via their fully qualified path, for example:
function process(event) {
...
const salt = 'pepper';
const hashedIp = org.apache.commons.codec.digest.DigestUtils.sha256Hex(event.getUser_ipaddress() + salt);
...
}