Normalize
Package Configuration Variables
This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your dbt_project.yml
file.
All variables in Snowplow packages start with snowplow__
but we have removed these in the below table for brevity.
Warehouse and tracker
Variable Name | Description | Default |
---|---|---|
atomic_schema | The schema (dataset for BigQuery) that contains your atomic events table. | atomic |
database | The database that contains your atomic events table. | target.database |
dev_target_name | The target name of your development environment as defined in your profiles.yml file. See the Manifest Tables section for more details. | dev |
events | This is used internally by the packages to reference your events table based on other variable values and should not be changed. | events |
Operation and logic
Variable Name | Description | Default |
---|---|---|
allow_refresh | Used as the default value to return from the allow_refresh() macro. This macro determines whether the manifest tables can be refreshed or not, depending on your environment. See the Manifest Tables section for more details. | false |
backfill_limit_days | The maximum numbers of days of new data to be processed since the latest event processed. Please refer to the incremental logic section for more details. | 30 |
days_late_allowed | The maximum allowed number of days between the event creation and it being sent to the collector. Exists to reduce lengthy table scans that can occur as a result of late arriving data. If set to -1 disables this filter entirely, which can be useful if you have events with no dvce_sent_tstamp value. | 3 |
lookback_window_hours | The number of hours to look before the latest event processed - to account for late arriving data, which comes out of order. | 6 |
start_date | The date to start processing events from in the package on first run or a full refresh, based on collector_tstamp . | '2020-01-01' |
upsert_lookback_days | Number of days to look back over the incremental derived tables during the upsert. Where performance is not a concern, should be set to as long a value as possible. Having too short a period can result in duplicates. Please see the Snowplow Optimized Materialization section for more details. | 30 |
Contexts, filters, and logs
Variable Name | Description | Default |
---|---|---|
app_id | A list of app_id s to filter the events table on for processing within the package. | [ ] (no filter applied) |
Warehouse Specific
- Databricks
- Bigquery
- Snowflake
Variable Name | Description | Default |
---|---|---|
databricks_catalog | The catalogue your atomic events table is in. Depending on the use case it should either be the catalog (for Unity Catalog users from databricks connector 1.1.1 onwards, defaulted to hive_metastore ) or the same value as your snowplow__atomic_schema (unless changed it should be 'atomic'). |
Variable Name | Description | Default |
---|---|---|
derived_tstamp_partitioned | Boolean to enable filtering the events table on derived_tstamp in addition to collector_tstamp . | true |
Variable Name | Description | Default |
---|---|---|
query_tag | This sets the value of the query_tag for all sql executed against the database. This is used internally for metric gathering in Snowflake and its value should not be changed. | snowplow_dbt |
Output Schemas
By default all scratch/staging tables will be created in the <target.schema>_scratch
schema, the derived tables, will be created in <target.schema>_derived
and all manifest tables in <target.schema>_snowplow_manifest
. Some of these schemas are only used by specific packages, ensure you add the correct configurations for each packages you are using. To change, please add the following to your dbt_project.yml
file:
If you want to use just your connection schema with no suffixes, set the +schema:
values to null
# dbt_project.yml
...
models:
snowplow_normalize:
base:
manifest:
+schema: my_manifest_schema
scratch:
+schema: my_scratch_schema
+tags: my_scratch_schema