Fractribution Quickstart

Unleash the power of your behavioral data

If you're looking for a more guided approach that contains information about tracking and modeling your data, check out our Fractional Attribution Modeling Accelerator!
👉 Take me there! 👈

Requirements

In addition to dbt being installed and a web events dataset being available in your database:

have snowplow_web_page_views derived table available as a source (generated by the snowplow_web package)
have a table with revenue data by users (domain_userid, user_id) that serves as another source for the fractribution calculations, you can choose either of the following options:
- your atomic.events table with any self-describing event that captures revenue data
- the snowplow_ecommerce_transaction_interactions derived table generated by the snowplow_ecommerce package
- any custom incremental table that is built on top of the snowplow_web model that results in an aggregated revenue dataset
python or docker installed

In addition to the standard privileges required by dbt, our packages by default write to additional schemas beyond just your profile schema. If your connected user does not have create schema privileges, you will need to ensure that the following schemas exist in your warehouse and the user can create tables in them:

<profile_schema>_derived
<profile_schema>_scratch
<profile_schema>_snowplow_manifest

Alternatively, you can override the output schemas our models write to, see the relevant package configuration page for how to do this.

Installation

Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.

Setup

1. Override the dispatch order in your project

To take advantage of the optimized upsert that the Snowplow packages offer you need to ensure that certain macros are called from snowplow_utils first before dbt-core. This can be achieved by adding the following to the top level of your dbt_project.yml file:

# dbt_project.yml
...
dispatch:
  - macro_namespace: dbt
    search_order: ['snowplow_utils', 'dbt']

If you do not do this the package will still work, but the incremental upserts will become more costly over time.

2. Set variables

The package has some variables that need to be set before it can be run, you should edit these in your dbt_project.yml file. Further customization can be done via the variables listed in the configuration page.

snowplow__conversion_window_start_date: The start date in UTC for the window of conversions to include
snowplow__conversion_window_end_date: The end date in UTC for the window of conversions to include
snowplow__conversion_hosts: url_hosts to process
snowplow__path_transforms: A dictionary of path transforms and their arguments (see Path Transform Options section)

# dbt_project.yml
...
vars:
  snowplow_fractribution:
    snowplow__conversion_window_start_date: '2022-01-01'
    snowplow__conversion_window_end_date: '2023-02-01'
    snowplow__conversion_hosts: ['mysite.com']
    snowplow__path_transforms: {'exposure_path' : null}

3. Configure macros

All the below macros are created with the intention to let users modify them to fit their personal use case. If you wish to change this, copy the macro from the macros folder in the snowplow_fractribution package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/conversion_clause.sql) and add it to the macros folder of your own dbt project where you are free to make any alterations. You will find a detailed guide / illustration with sample code within the individual macros themselves.

`conversion_clause` macro

The conversion_clause macro specifies how to filter Snowplow events to only succesfful conversion events. How this is filtered will depend on your definition of a conversion. The default is filtering to events where tr_total > 0, but this could instead filter on event_name = 'checkout', for example. If you are using the e-commerce model, you will still need to set this for the fractribution code to run (even though all events are conversions in the e-commerce model), in this case change it to transaction_revenue > 0.

`conversion_value` macro

The conversion_value macro specifies either a single column or a calculated value that represents the value associated with that conversion. The default is tr_total, but revenue or a calculation using revenue and discount_amount from the default e-commerce schema, for example, could similarly be used.

`channel_classification` macro

The channel_classification macro is used to perform channel classifications. This can be altered to generate your expected channels if they differ from the channels generated in the default macro. It is highly recommended that you examine and configure this macro when using your own data, as the ROAS calculations and attribution calculations will run against these channel definitions, and the default values will not consider any custom marketing parameters.

`channel_spend` macro

The channel_spend macro is used to query the spend by channels. It requires a user supplied SQL script to extract the total ad spend by channel.
Required output format:
channel: STRING NOT NULL
spend: FLOAT64 (or other warehouse equivalent) (Use the same monetary units as conversion revenue, and NULL if unknown.)

4. Run the model

Execute the following either through your CLI, within dbt Cloud, or within Snowplow BDP

dbt run --select snowplow_fractribution

5. Run the python script to generate the final models

Locally run python

tip

To run the fractribution script locally in Python, we recommend using a virtual environment such as one in conda or pyenv.

Example using conda:

conda create --name fractribution_env -c https://repo.anaconda.com/pkgs/snowflake python=3.8 absl-py
conda activate fractribution_env

I. Install packages

You can install the packages using pip install -r dbt_packages/snowplow_fractribution/utils/requirements.txt (or the appropriate path from your terminal working directory).

Please note that some of the libraries are adapter specific. These are listed in the requirements file, and you can also find the necessary list for each adapter below:

BigQuery
Databricks
Snowflake

absl-py==1.2.0
google-cloud-bigquery==3.5.0

absl-py==1.2.0,
databricks-sql-connector==2.1.0
pandas

absl-py==1.2.0,
snowflake-snowpark-python==0.11.0

M1 Instructions (for Snowflake only)

:::caution There is an issue with running Snowpark on M1 chips. A workaround recommended by Snowflake is to set up a virtual environment that uses x86 Python:

CONDA_SUBDIR=osx-64 conda create -n fractribution_env python=3.8 absl-py -c https://repo.anaconda.com/pkgs/snowflake
conda activate fractribution_env
conda config --env --set subdir osx-64

:::

II. Set the connection parameters in your terminal

BigQuery
Databricks
Snowflake

export project_id=project_id\
export bigquery_dataset=bigquery_dataset\
export google_application_credentials=google_application_credentials

export databricks_schema=derived_schema_name\
export databricks_server_hostname=hostname\
export databricks_http_path=http_path\
export databricks_token=token

export snowflake_account=my_account\
export snowflake_user=sf_user\
export snowflake_password=password\
export snowflake_user_role=special_role\
export snowflake_warehouse=warehouse_name\
export snowflake_database=database_name\
export snowflake_schema=derived_schema_name

III. Run the fractribution script

Run the adapter specific main fractribution script by specifying the conversion window start and end dates, and the attribution model (if you are not using the default shapley, see here for more options). Example:

BigQuery
Databricks
Snowflake

python main_snowplow_bigquery.py --conversion_window_start_date '2022-06-03' --conversion_window_end_date '2022-08-01' --attribution_model last_touch

python main_snowplow_databricks.py --conversion_window_start_date '2022-06-03' --conversion_window_end_date '2022-08-01' --attribution_model last_touch

python main_snowplow_snowflake.py --conversion_window_start_date '2022-06-03' --conversion_window_end_date '2022-08-01' --attribution_model last_touch

Run python with docker

I. Pull the docker image

You can pull the latest docker image from Docker Hub: docker pull snowplow/fractribution:latest. Alternatively, you can pull it based on the package version: docker pull snowplow/fractribution:0.2.0

II. Set the environment variables

Add the necessary environment variables to an environment file, e.g. configs.env. The necessary variables will differ depending on the data warehouse you are using. The easiest way to determine the variables you need to set is to check the Dockerfile in the fractribution dbt package: dbt-snowplow-fractribution/utils/Dockerfile. Please note that in case of BigQuery, the google_application_credentials env var is not needed for Docker as you mount this as a volume at run time.

Below is an example of the config.env file (set up for Snowflake). You do not need to specify the attribution model if using the default, shapley:

snowflake_account=youraccount.ap-southeast-2
snowflake_user=user
snowflake_password=abc123
snowflake_user_role=DBT
snowflake_warehouse=WH
snowflake_database=snowplow
snowflake_schema=FRACTRIBUTION_DERIVED

conversion_window_start_date=2022-06-03
conversion_window_end_date=2022-08-01
attribution_model=last_touch
warehouse=snowflake

III. Run the docker container

Run the docker container :

If you are using Bigquery,

BigQuery
Databricks
Snowflake

With BigQuery you need to mount your service account keyfile when running the docker image

docker run --rm --env-file /path/to/env/file/configs.env -v /path/to/yourkeyfile.json:/keyfile.json -it snowplow/fractribution:latest

docker run --rm --env-file /path/to/env/file/configs.env -it snowplow/fractribution:latest

docker run --rm --env-file /path/to/env/file/configs.env -it snowplow/fractribution:latest

Requirements​

Installation​

Setup​

1. Override the dispatch order in your project​

2. Set variables​

3. Configure macros​

conversion_clause macro​

conversion_value macro​

channel_classification macro​

channel_spend macro​

4. Run the model​

5. Run the python script to generate the final models​

I. Install packages​

II. Set the connection parameters in your terminal​

III. Run the fractribution script​

I. Pull the docker image ​​

II. Set the environment variables​

III. Run the docker container​