Snowplow Optimized Materialization
This package makes use of the standard dbt incremental
materialization with an optimization applied for incremental models. Its key advantage is that it limits table scans on the target table when updating/inserting based on the new data. This improves performance and reduces cost. We do this by overriding the macro that generates the sql for the merge
and insert_delete
incremental methods.
All other features of the incremental
materialization are supported including incremental_predicates
and on_schema_change
. The code for the overridden macro can be found here.
Usage
To enable the materialization on a model you need to ensure a unique_key
and upsert_date_key
are provided in the model config, and that snowplow_optimize=true
in the config as well.
In addition, the following must be added to your dbt_project.yml
file once.
# dbt_project.yml
...
dispatch:
- macro_namespace: dbt
search_order: ['snowplow_utils', 'dbt']
If you wish to disable the buffer we apply to the upsert in the case of late arriving data (defined by snowplow__upsert_lookback_days
) you can set disable_upsert_lookback
to true
in your model config.