Skip to main content

Snowplow Web Models

caution

This page is auto-generated from our dbt packages, some information may be incomplete

Snowplow Web

Snowplow Web Base Events This Run

models/base/scratch/<adaptor>/snowplow_web_base_events_this_run.sql

Description

For any given run, this table contains all required events to be consumed by subsequent nodes in the Snowplow dbt web package. This is a cleaned, deduped dataset, containing all columns from the raw events table as well as having the page_view_id joined in from the page view context.

Note: This table should be used as the input to any custom modules that require event level data, rather than selecting straight from atomic.events

File Paths

models/base/scratch/default/snowplow_web_base_events_this_run.sql

Details

Columns
note

Base event this run table column lists may be incomplete and is missing contexts/unstructs, please check your warehouse for a more accurate column list.

Column NameDescription
app_idApplication ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.
platformPlatform e.g. ‘web’
etl_tstampTimestamp event began ETL e.g. ‘2017-01-26 00:01:25.292’
collector_tstampTime stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’
dvce_created_tstampTimestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’
eventThe type of event recorded e.g. ‘page_view’
event_idA UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
txn_idTransaction ID set client-side, used to de-dupe records e.g. 421828
name_trackerTracker namespace e.g. ‘sp1’
v_trackerTracker version e.g. ‘js-3.0.0’
v_collectorCollector version e.g. ‘ssc-2.1.0-kinesis’
v_etlETL version e.g. ‘snowplow-micro-1.1.0-common-1.4.2’
user_idUnique ID set by business e.g. ‘jon.doe@email.com
user_ipaddressUser IP address e.g. ‘92.231.54.234’
user_fingerprintA user fingerprint generated by looking at the individual browser features e.g. 2161814971
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’
domain_sessionidxA visit / session index e.g. 3
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’
geo_regionISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’
geo_cityCity the visitor is in e.g. ‘New York’, ‘London’
geo_zipcodePostcode the visitor is in e.g. ‘94109’
geo_latitudeVisitor location latitude e.g. 37.443604
geo_longitudeVisitor location longitude e.g. -122.4124
geo_region_nameVisitor region name e.g. ‘Florida’
ip_ispVisitor’s ISP e.g. ‘FDN Communications’
ip_organizationOrganization associated with the visitor’s IP address – defaults to ISP name if none is found e.g. ‘Bouygues Telecom’
ip_domainSecond level domain name associated with the visitor’s IP address e.g. ‘nuvox.net’
ip_netspeedVisitor’s connection type e.g. ‘Cable/DSL’
page_urlThe page URL e.g. ‘http://www.example.com’
page_titleWeb page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’
page_referrerURL of the referrer e.g. ‘http://www.referrer.com’
page_urlschemeScheme aka protocol e.g. ‘https’
page_urlhostHost aka domain e.g. ‘“www.snowplow.io’
page_urlportPort if specified, 80 if not 80
page_urlpathPath to page e.g. ‘/product/index.html’
page_urlqueryQuerystring e.g. ‘id=GTM-DLRG’
page_urlfragmentFragment aka anchor e.g. ‘4-conclusion’
refr_urlschemeReferer scheme e.g. ‘http’
refr_urlhostReferer host e.g. ‘www.bing.com’
refr_urlportReferer port e.g. 80
refr_urlpathReferer page path e.g. ‘/images/search’
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’
refr_urlfragmentReferer URL fragment
refr_mediumType of referer e.g. ‘search’, ‘internal’
refr_sourceName of referer if recognised e.g. ‘Bing images’
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723
mkt_campaignThe campaign ID e.g. ‘diageo-123’
se_categoryCategory of event e.g. ‘ecomm’, ‘video’
se_actionAction performed / event name e.g. ‘add-to-basket’, ‘play-video’
se_labelThe object of the action e.g. the ID of the video played or SKU of the product added-to-basket e.g. ‘pbz00123’
se_propertyA property associated with the object of the action e.g. ‘HD’, ‘large’
se_valueA value associated with the event / action e.g. the value of goods added-to-basket e.g. 9.99
tr_orderidOrder ID e.g. ‘#134’
tr_affiliationTransaction affiliation (e.g. store where sale took place) e.g. ‘web’
tr_totalTotal transaction value e.g. 12.99
tr_taxTotal tax included in transaction value e.g. 3.00
tr_shippingDelivery cost charged e.g. 0.00
tr_cityDelivery address, city e.g. ‘London’
tr_stateDelivery address, state e.g. ‘Washington’
tr_countryDelivery address, country e.g. ‘France’
ti_orderidOrder ID e.g. ‘#134’
ti_skuProduct SKU e.g. ‘pbz00123’
ti_nameProduct name e.g. ‘Cone pendulum’
ti_categoryProduct category e.g. ‘New Age’
ti_priceProduct unit price e.g. 9.99
ti_quantityNumber of product in transaction e.g. 2
pp_xoffset_minMinimum page x offset seen in the last ping period e.g. 0
pp_xoffset_maxMaximum page x offset seen in the last ping period e.g. 100
pp_yoffset_minMinimum page y offset seen in the last ping period e.g. 0
pp_yoffset_maxMaximum page y offset seen in the last ping period e.g. 200
useragentRaw useragent
br_nameBrowser name e.g. ‘Firefox 12’
br_familyBrowser family e.g. ‘Firefox’
br_versionBrowser version e.g. ‘12.0’
br_typeBrowser type e.g. ‘Browser’
br_renderengineBrowser rendering engine e.g. ‘GECKO’
br_langLanguage the browser is set to e.g. ‘en-GB’
br_features_pdfWhether the browser recognizes PDFs e.g. True
br_features_flashWhether Flash is installed e.g. True
br_features_javaWhether Java is installed e.g. True
br_features_directorWhether Adobe Shockwave is installed e.g. True
br_features_quicktimeWhether QuickTime is installed e.g. True
br_features_realplayerWhether RealPlayer is installed e.g. True
br_features_windowsmediaWhether mplayer2 is installed e.g. True
br_features_gearsWhether Google Gears is installed e.g. True
br_features_silverlightWhether Microsoft Silverlight is installed e.g. True
br_cookiesWhether cookies are enabled e.g. True
br_colordepthBit depth of the browser color palette e.g. 24
br_viewwidthViewport width e.g. 1000
br_viewheightViewport height e.g. 1000
os_nameName of operating system e.g. ‘Android’
os_familyOperating system family e.g. ‘Linux’
os_manufacturerCompany responsible for OS e.g. ‘Apple’
os_timezoneClient operating system timezone e.g. ‘Europe/London’
dvce_typeType of device e.g. ‘Computer’
dvce_ismobileIs the device mobile? e.g. True
dvce_screenwidthScreen width in pixels e.g. 1900
dvce_screenheightScreen height in pixels e.g. 1024
doc_charsetThe page’s character encoding e.g. , ‘UTF-8’
doc_widthThe page’s width in pixels e.g. 1024
doc_heightThe page’s height in pixels e.g. 3000
tr_currencyCurrency e.g. ‘USD’
tr_total_baseTotal in base currency e.g. 12.99
tr_tax_baseTotal tax in base currency e.g. 3.00
tr_shipping_basedecimal Delivery cost in base currency e.g. 0.00
ti_currencyCurrency e.g. ‘EUR’
ti_price_basedecimal Price in base currency e.g. 9.99
base_currencyReporting currency e.g. ‘GBP’
geo_timezoneVisitor timezone name e.g. ‘Europe/London’
mkt_clickidThe click ID e.g. ‘ac3d8e459’
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’
etl_tagsJSON of tags for this ETL run e.g. “[‘prod’]
dvce_sent_tstampWhen the event was sent by the client device e.g. ‘2013-11-26 00:03:58.032’
refr_domain_useridThe Snowplow domain_userid of the referring website e.g. ‘bc2e92ec6c204a14’
refr_dvce_tstampThe time of attaching the domain_userid to the inbound link e.g. ‘2013-11-26 00:02:05’
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
derived_tstampTimestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’
event_vendorWho defined the event e.g. ‘com.acme’
event_nameEvent name e.g. ‘link_click’
event_formatFormat for event e.g. ‘jsonschema’
event_versionVersion of event schema e.g. ‘1-0-2’
event_fingerprintHash client-set event fields e.g. AADCE520E20C2899F4CED228A79A3083
true_tstampUser-set “true timestamp” for the event e.g. ‘2013-11-26 00:02:04’
page_view_id
Code
Source
{{
config(
sort='collector_tstamp',
dist='event_id',
tags=["this_run"]
)
}}

{%- set lower_limit, upper_limit = snowplow_utils.return_limits_from_model(ref('snowplow_web_base_sessions_this_run'),
'start_tstamp',
'end_tstamp') %}

/* Dedupe logic: Per dupe event_id keep earliest row ordered by collector_tstamp.
If multiple earliest rows, take arbitrary one using row_number(). */

with events_this_run AS (
select
a.app_id,
a.platform,
a.etl_tstamp,
a.collector_tstamp,
a.dvce_created_tstamp,
a.event,
a.event_id,
a.txn_id,
a.name_tracker,
a.v_tracker,
a.v_collector,
a.v_etl,
a.user_id,
a.user_ipaddress,
a.user_fingerprint,
b.domain_userid, -- take domain_userid from manifest. This ensures only 1 domain_userid per session.
a.domain_sessionidx,
a.network_userid,
a.geo_country,
a.geo_region,
a.geo_city,
a.geo_zipcode,
a.geo_latitude,
a.geo_longitude,
a.geo_region_name,
a.ip_isp,
a.ip_organization,
a.ip_domain,
a.ip_netspeed,
a.page_url,
a.page_title,
a.page_referrer,
a.page_urlscheme,
a.page_urlhost,
a.page_urlport,
a.page_urlpath,
a.page_urlquery,
a.page_urlfragment,
a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlport,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,
a.refr_medium,
a.refr_source,
a.refr_term,
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.se_category,
a.se_action,
a.se_label,
a.se_property,
a.se_value,
a.tr_orderid,
a.tr_affiliation,
a.tr_total,
a.tr_tax,
a.tr_shipping,
a.tr_city,
a.tr_state,
a.tr_country,
a.ti_orderid,
a.ti_sku,
a.ti_name,
a.ti_category,
a.ti_price,
a.ti_quantity,
a.pp_xoffset_min,
a.pp_xoffset_max,
a.pp_yoffset_min,
a.pp_yoffset_max,
a.useragent,
a.br_name,
a.br_family,
a.br_version,
a.br_type,
a.br_renderengine,
a.br_lang,
a.br_features_pdf,
a.br_features_flash,
a.br_features_java,
a.br_features_director,
a.br_features_quicktime,
a.br_features_realplayer,
a.br_features_windowsmedia,
a.br_features_gears,
a.br_features_silverlight,
a.br_cookies,
a.br_colordepth,
a.br_viewwidth,
a.br_viewheight,
a.os_name,
a.os_family,
a.os_manufacturer,
a.os_timezone,
a.dvce_type,
a.dvce_ismobile,
a.dvce_screenwidth,
a.dvce_screenheight,
a.doc_charset,
a.doc_width,
a.doc_height,
a.tr_currency,
a.tr_total_base,
a.tr_tax_base,
a.tr_shipping_base,
a.ti_currency,
a.ti_price_base,
a.base_currency,
a.geo_timezone,
a.mkt_clickid,
a.mkt_network,
a.etl_tags,
a.dvce_sent_tstamp,
a.refr_domain_userid,
a.refr_dvce_tstamp,
a.domain_sessionid,
a.derived_tstamp,
a.event_vendor,
a.event_name,
a.event_format,
a.event_version,
a.event_fingerprint,
a.true_tstamp,
{% if var('snowplow__enable_load_tstamp', true) %}
a.load_tstamp,
{% endif %}
row_number() over (partition by a.event_id order by a.collector_tstamp) as event_id_dedupe_index

from {{ var('snowplow__events') }} as a
inner join {{ ref('snowplow_web_base_sessions_this_run') }} as b
on a.domain_sessionid = b.session_id

where a.collector_tstamp <= {{ snowplow_utils.timestamp_add('day', var("snowplow__max_session_days", 3), 'b.start_tstamp') }}
and a.dvce_sent_tstamp <= {{ snowplow_utils.timestamp_add('day', var("snowplow__days_late_allowed", 3), 'a.dvce_created_tstamp') }}
and a.collector_tstamp >= {{ lower_limit }}
and a.collector_tstamp <= {{ upper_limit }}
and {{ snowplow_utils.app_id_filter(var("snowplow__app_id",[])) }}
)

, page_context as (
select
root_id,
root_tstamp,
id as page_view_id,
row_number() over (partition by root_id order by root_tstamp) as page_context_dedupe_index

from {{ var('snowplow__page_view_context') }}
where
root_tstamp >= {{ lower_limit }}
and root_tstamp <= {{ upper_limit }}
)

, page_context_dedupe as (
select
*

from page_context
where page_context_dedupe_index = 1
)

select
e.*,
pc.page_view_id

from events_this_run as e
left join page_context_dedupe as pc
on e.event_id = pc.root_id
and e.collector_tstamp = pc.root_tstamp

where e.event_id_dedupe_index = 1

Depends On

Referenced By

Snowplow Web Base New Event Limits

models/base/scratch/snowplow_web_base_new_event_limits.sql

Description

This table contains the lower and upper timestamp limits for the given run of the web model. These limits are used to select new events from the events table.

Type: Table

Details

Columns
Column NameDescriptionType
lower_limitThe lower collector_tstamp limit for the runtimestamp_ntz
upper_limitThe upper collector_tstamp limit for the runtimestamp_ntz
Code
Source
{{ config(
post_hook=["{{snowplow_utils.print_run_limits(this)}}"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}


{%- set models_in_run = snowplow_utils.get_enabled_snowplow_models('snowplow_web') -%}

{% set min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models = snowplow_utils.get_incremental_manifest_status(ref('snowplow_web_incremental_manifest'),
models_in_run) -%}


{% set run_limits_query = snowplow_utils.get_run_limits(min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models,
var("snowplow__start_date","2020-01-01")) -%}


{{ run_limits_query }}

Depends On

Referenced By

Snowplow Web Base Quarantined Sessions

models/base/manifest/snowplow_web_base_quarantined_sessions.sql

Description

This table contains any sessions that have been quarantined. Sessions are quarantined once they exceed the maximum allowed session length, defined by snowplow__max_session_days. Once quarantined, no further events from these sessions will be processed. Events up until the point of quarantine remain in your derived tables. The reason for removing long sessions is to reduce table scans on both the events table and all derived tables. This improves performance greatly.

Type: Table

Details

Columns
Column NameDescriptionType
session_idThe session_id of the quarantined sessiontext
Code
Source
{{
config(
materialized='incremental',
full_refresh=snowplow_web.allow_refresh(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
}
)
}}

/*
Boilerplate to generate table.
Table updated as part of post-hook on sessions_this_run
Any sessions exceeding max_session_days are quarantined
Once quarantined, any subsequent events from the session will not be processed.
This significantly reduces table scans
*/

with prep as (
select
cast(null as {{ type_string() }}) session_id
)

select *

from prep
where false

Depends On

Referenced By

Snowplow Web Base Sessions Lifecycle Manifest

models/base/manifest/snowplow_web_base_sessions_lifecycle_manifest.sql

Description

This incremental table is a manifest of all sessions that have been processed by the Snowplow dbt web model. For each session, the start and end timestamp is recorded.

By knowing the lifecycle of a session the model is able to able to determine which sessions and thus events to process for a given timeframe, as well as the complete date range required to reprocess all events of each session.

Type: Table

Details

Columns
Column NameDescriptionType
session_idA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
start_tstampThe collector_tstamp when the session begantimestamp_ntz
end_tstampThe collector_tstamp when the session endedtimestamp_ntz
Code
Source
{{
config(
materialized='incremental',
unique_key='session_id',
upsert_date_key='start_tstamp',
sort='start_tstamp',
dist='session_id',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_sessions_lifecycle(),
full_refresh=snowplow_web.allow_refresh(),
tags=["manifest"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}

-- Known edge cases:
-- 1: Rare case with multiple domain_userid per session.

{% set lower_limit, upper_limit, _ = snowplow_utils.return_base_new_event_limits(ref('snowplow_web_base_new_event_limits')) %}
{% set session_lookback_limit = snowplow_utils.get_session_lookback_limit(lower_limit) %}
{% set is_run_with_new_events = snowplow_utils.is_run_with_new_events('snowplow_web') %}

with new_events_session_ids as (
select
e.domain_sessionid as session_id,
max(e.domain_userid) as domain_userid, -- Edge case 1: Arbitary selection to avoid window function like first_value.
min(e.collector_tstamp) as start_tstamp,
max(e.collector_tstamp) as end_tstamp

from {{ var('snowplow__events') }} e

where
e.domain_sessionid is not null
and not exists (select 1 from {{ ref('snowplow_web_base_quarantined_sessions') }} as a where a.session_id = e.domain_sessionid) -- don't continue processing v.long sessions
and e.dvce_sent_tstamp <= {{ snowplow_utils.timestamp_add('day', var("snowplow__days_late_allowed", 3), 'dvce_created_tstamp') }} -- don't process data that's too late
and e.collector_tstamp >= {{ lower_limit }}
and e.collector_tstamp <= {{ upper_limit }}
and {{ snowplow_utils.app_id_filter(var("snowplow__app_id",[])) }}
and {{ is_run_with_new_events }} --don't reprocess sessions that have already been processed.
{% if var('snowplow__derived_tstamp_partitioned', true) and target.type == 'bigquery' | as_bool() %} -- BQ only
and e.derived_tstamp >= {{ lower_limit }}
and e.derived_tstamp <= {{ upper_limit }}
{% endif %}

group by 1
)

{% if is_incremental() %}

, previous_sessions as (
select *

from {{ this }}

where start_tstamp >= {{ session_lookback_limit }}
and {{ is_run_with_new_events }} --don't reprocess sessions that have already been processed.
)

, session_lifecycle as (
select
ns.session_id,
coalesce(self.domain_userid, ns.domain_userid) as domain_userid, -- Edge case 1: Take previous value to keep domain_userid consistent. Not deterministic but performant
least(ns.start_tstamp, coalesce(self.start_tstamp, ns.start_tstamp)) as start_tstamp,
greatest(ns.end_tstamp, coalesce(self.end_tstamp, ns.end_tstamp)) as end_tstamp -- BQ 1 NULL will return null hence coalesce

from new_events_session_ids ns
left join previous_sessions as self
on ns.session_id = self.session_id

where
self.session_id is null -- process all new sessions
or self.end_tstamp < {{ snowplow_utils.timestamp_add('day', var("snowplow__max_session_days", 3), 'self.start_tstamp') }} --stop updating sessions exceeding 3 days
)

{% else %}

, session_lifecycle as (

select * from new_events_session_ids

)

{% endif %}

select
sl.session_id,
sl.domain_userid,
sl.start_tstamp,
least({{ snowplow_utils.timestamp_add('day', var("snowplow__max_session_days", 3), 'sl.start_tstamp') }}, sl.end_tstamp) as end_tstamp -- limit session length to max_session_days
{% if target.type in ['databricks', 'spark'] -%}
, DATE(start_tstamp) as start_tstamp_date
{%- endif %}

from session_lifecycle sl

Depends On

Referenced By

Snowplow Web Base Sessions This Run

models/base/scratch/snowplow_web_base_sessions_this_run.sql

Description

For any given run, this table contains all the required sessions.

Type: Table

Details

Columns
Column NameDescriptionType
session_idA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
start_tstampThe collector_tstamp when the session begantimestamp_ntz
end_tstampThe collector_tstamp when the session endedtimestamp_ntz
Code
Source
{{
config(
tags=["this_run"],
post_hook=[
"{{ snowplow_utils.quarantine_sessions('snowplow_web', var('snowplow__max_session_days')) }}"
],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

{%- set lower_limit,
upper_limit,
session_start_limit = snowplow_utils.return_base_new_event_limits(ref('snowplow_web_base_new_event_limits')) %}

select
s.session_id,
s.domain_userid,
s.start_tstamp,
-- end_tstamp used in next step to limit events. When backfilling, set end_tstamp to upper_limit if end_tstamp > upper_limit.
-- This ensures we don't accidentally process events after upper_limit
case when s.end_tstamp > {{ upper_limit }} then {{ upper_limit }} else s.end_tstamp end as end_tstamp

from {{ ref('snowplow_web_base_sessions_lifecycle_manifest')}} s

where
-- General window of start_tstamps to limit table scans. Logic complicated by backfills.
-- To be within the run, session start_tstamp must be >= lower_limit - max_session_days as we limit end_tstamp in manifest to start_tstamp + max_session_days
s.start_tstamp >= {{ session_start_limit }}
and s.start_tstamp <= {{ upper_limit }}
-- Select sessions within window that either; start or finish between lower & upper limit, start and finish outside of lower and upper limits
and not (s.start_tstamp > {{ upper_limit }} or s.end_tstamp < {{ lower_limit }})

Depends On

Referenced By

models/optional_modules/consent/snowplow_web_consent_cmp_stats.sql

Description

Used for modeling cmp_visible events and related metrics

Type: Table

Details

Columns
Column NameDescriptionType
event_idA UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_useridThe optional userid of a usertext
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
cmp_load_timeThe time taken for the consent box to be shown to the screenfloat
cmp_tstampThe timestamp of the cmp_visible eventtimestamp_ntz
first_consent_event_tstampThe timestamp of the first consent event after a cmp_visible eventtimestamp_ntz
first_consent_event_typeThe event type of the first consent event after a cmp_visible eventtext
cmp_interaction_timeThe time it takes for the user to make a consent choice after the cmp_visible event is firednumber
Code
Source
{{
config(
materialized='table',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

{%- if target.type in ('postgres') -%}

with events as (

select
event_id,
domain_userid,
page_view_id,
domain_sessionid,
derived_tstamp,
event_name,
event_type,
cmp_load_time,
-- postgres does not allow the IGNORE NULL clause within last_value(), below workaround should do the same: removing NULLS using array_remove then using the COUNT window function (which counts the number of non-null items and count is bounded up to the current row) to access the array using that as its index position
(array_remove(array_agg(case when event_name = 'cmp_visible' then event_id else null end) over (partition by domain_userid order by derived_tstamp), null))[count(case when event_name = 'cmp_visible' then event_id else null end) over (partition by domain_userid order by derived_tstamp rows between unbounded preceding and current row)] as cmp_id

from {{ ref('snowplow_web_consent_log') }}

where event_type <> 'pending' or event_type is null

)

{%- elif target.type in ('databricks', 'spark') -%}

with events as (

select
event_id,
domain_userid,
page_view_id,
domain_sessionid,
derived_tstamp,
event_name,
event_type,
cmp_load_time,
last_value(case when event_name = 'cmp_visible' then event_id else null end, TRUE)
over (partition by domain_userid order by derived_tstamp
rows between unbounded preceding and current row) as cmp_id

from {{ ref('snowplow_web_consent_log') }}

where event_type <> 'pending' or event_type is null

)

{%- else -%}

with events as (

select
event_id,
domain_userid,
page_view_id,
domain_sessionid,
derived_tstamp,
event_name,
event_type,
cmp_load_time,
last_value(case when event_name = 'cmp_visible' then event_id else null end ignore nulls)
over (partition by domain_userid order by derived_tstamp
rows between unbounded preceding and current row) as cmp_id

from {{ ref('snowplow_web_consent_log') }}

where event_type <> 'pending' or event_type is null

)

{%- endif -%}

, event_orders as (

select
event_id,
event_type,
cmp_id,
derived_tstamp,
row_number() over(partition by cmp_id order by derived_tstamp) as row_num

from events

)

, first_consent_events as (

select
event_id,
cmp_id,
event_type,
derived_tstamp as first_consent_event_tstamp

from event_orders

where row_num = 2

)

, cmp_events as (

select distinct
event_id,
domain_userid,
page_view_id,
domain_sessionid,
cmp_load_time,
derived_tstamp as cmp_tstamp

from events

where event_name = 'cmp_visible'

)

select
e.event_id,
e.domain_userid,
e.page_view_id,
e.domain_sessionid,
e.cmp_load_time,
e.cmp_tstamp,
f.first_consent_event_tstamp,
f.event_type as first_consent_event_type,
{{ datediff('e.cmp_tstamp', 'f.first_consent_event_tstamp', 'second') }} as cmp_interaction_time

from cmp_events e

left join first_consent_events f
on e.event_id = f.cmp_id

Depends On

models/optional_modules/consent/scratch/<adaptor>/snowplow_web_consent_events_this_run.sql

Description

This model does not currently have a description.

Type: Table

File Paths

models/optional_modules/consent/scratch/default/snowplow_web_consent_events_this_run.sql

Details

Columns
Column NameDescriptionType
event_idtext
domain_useridtext
user_idtext
geo_countrytext
page_view_idtext
domain_sessionidtext
derived_tstamptimestamp_ntz
load_tstamptimestamp_ntz
event_nametext
event_typetext
basis_for_processingtext
consent_urltext
consent_versiontext
consent_scopestext
domains_appliedtext
gdpr_appliesboolean
cmp_load_timefloat
Code
Source
{{
config(
tags=["this_run"]
)
}}

{%- set lower_limit, upper_limit = snowplow_utils.return_limits_from_model(ref('snowplow_web_base_sessions_this_run'),
'start_tstamp',
'end_tstamp') %}

with consent_pref as (

select
root_id,
root_tstamp,
event_type,
basis_for_processing,
consent_url,
consent_version,
consent_scopes,
domains_applied,
gdpr_applies,
row_number() over (partition by root_id order by root_tstamp) dedupe_index

from {{ var('snowplow__consent_preferences') }}

where root_tstamp >= {{ lower_limit }}
and root_tstamp <= {{ upper_limit }}

)

, cmp_visible as (

select
root_id,
root_tstamp,
elapsed_time,
row_number() over (partition by root_id order by root_tstamp) dedupe_index

from {{ var('snowplow__consent_cmp_visible') }}

where root_tstamp >= {{ lower_limit }}
and root_tstamp <= {{ upper_limit }}

)

select
e.event_id,
e.domain_userid,
e.user_id,
e.geo_country,
e.page_view_id,
e.domain_sessionid,
e.derived_tstamp,
e.load_tstamp,
e.event_name,
p.event_type,
p.basis_for_processing,
p.consent_url,
p.consent_version,
replace(translate(p.consent_scopes, '"[]', ''), ',', ', ') as consent_scopes,
replace(translate(p.domains_applied, '"[]', ''), ',', ', ') as domains_applied,
coalesce(p.gdpr_applies, false) as gdpr_applies,
v.elapsed_time as cmp_load_time

from {{ ref("snowplow_web_base_events_this_run") }} as e

left join consent_pref p
on e.event_id = p.root_id
and e.collector_tstamp = p.root_tstamp
and p.dedupe_index = 1

left join cmp_visible v
on e.event_id = v.root_id
and e.collector_tstamp = v.root_tstamp
and v.dedupe_index = 1

where event_name in ('cmp_visible', 'consent_preferences')

and {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.

Depends On

Referenced By

models/optional_modules/consent/snowplow_web_consent_log.sql

Description

Incremental table showing the audit trail of consent and Consent Management Platform (cmp) events

Type: Table

Details

Columns
Column NameDescriptionType
event_idA UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’text
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
derived_tstampTimestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’timestamp_ntz
load_tstampThe timestamp of the event landing the data warehouse.timestamp_ntz
event_nameEvent name e.g. ‘link_click’text
event_typeThe action for the consent preferences of a user E.g allow_alltext
basis_for_processingGDPR lawful basis for data collection & processingtext
consent_urlURI of the privacy policy related documenttext
consent_versionVersion of the privacy policy related documenttext
consent_scopesThe scopes allowed after the user finalized his selection of consent preferences Eg ['analytics', 'functional', 'advertisement']text
domains_appliedThe domains for which this consent allows these preferences to persist totext
gdpr_appliesA boolean which determines if GDPR applies based on the user's geo-locationboolean
cmp_load_timeThe time taken for the consent box to be shown to the screenfloat
Code
Source
{{
config(
materialized= 'incremental',
unique_key='event_id',
upsert_date_key='derived_tstamp',
sort='derived_tstamp',
dist='event_id',
tags=["derived"],
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val = {
"field": "derived_tstamp",
"data_type": "timestamp"
}, databricks_val = 'derived_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_consent(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize= true
)
}}

select
*
{% if target.type in ['databricks', 'spark'] -%}
, DATE(derived_tstamp) as derived_tstamp_date
{%- endif %}

from {{ ref('snowplow_web_consent_events_this_run') }}

where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.

Depends On

Referenced By

models/optional_modules/consent/snowplow_web_consent_scope_status.sql

Description

Aggregate of current number of users consented to each consent scope

Type: Table

Details

Columns
Column NameDescriptionType
scopeConsent scopetext
total_consentThe number of consent events corresponding to a scopenumber
Code
Source
{{
config(
materialized='table',
)
}}

with arrays as (

select
u.domain_userid,
{{ snowplow_utils.get_split_to_array('last_consent_scopes', 'u', ', ') }} as scope_array

from {{ ref('snowplow_web_consent_users') }} u

where is_latest_version

)

, unnesting as (

{{ snowplow_utils.unnest('domain_userid', 'scope_array', 'consent_scope', 'arrays') }}

)

select
replace(replace(replace(cast(consent_scope as {{type_string() }}), '"', ''), '[', ''), ']', '') as scope,
count(*) as total_consent

from unnesting

group by 1

Depends On

models/optional_modules/consent/snowplow_web_consent_totals.sql

Description

Summary of the latest consent status as per consent version

Type: Table

Details

Columns
Column NameDescriptionType
consent_versionVersion of the privacy policy related documenttext
version_start_tstampThe first allow_all consent event belonging to a consent versiontimestamp_ntz
consent_scopesThe scopes allowed after the user finalized his selection of consent preferences Eg ['analytics', 'functional', 'advertisement']text
consent_urlURI of the privacy policy related documenttext
domains_appliedThe domains for which this consent allows these preferences to persist totext
is_latest_versionA boolean to filter whether the last consent or cmp visible event is sent after the latest privacy policy version goes liveboolean
last_allow_all_eventThe timestamp of the last allow_all consent event generated by the latest consent versiontimestamp_ntz
total_visitorsThe number of visitors who have visited since the last consent version is livenumber
allow_allTotal number of users whose last consent event sent from the latest consent version has type allow_allnumber
allow_selectedTotal number of users whose last consent event sent from the latest consent version has type allow_selectednumber
allowTotal number of users whose last consent event sent from the latest consent version has type allownumber
pendingTotal number of users whose last consent event sent from the latest consent version has type pendingnumber
deniedTotal number of users whose last consent event sent from the latest consent version has type deniednumber
expiredTotal number of users whose last consent event sent from the latest consent version has type expirednumber
withdrawnTotal number of users whose last consent event sent from the latest consent version has type withdrawnnumber
implicit_consentnumber
expires_in_six_monthsThe total number of users whose consent expires in six months (only the offical version is taken into account)number
Code
Source
{{
config(
materialized='table',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

with totals as (

select
last_consent_version,
count(distinct domain_userid) as total_visitors,
count(case when last_consent_event_type ='allow_all' then 1 end) as allow_all,
count(case when last_consent_event_type ='allow_selected' then 1 end) as allow_selected,
count(case when last_consent_event_type IN ('allow_all', 'allow_selected') then 1 end) as allow,
count(case when last_consent_event_type = 'pending' then 1 end) as pending,
count(case when last_consent_event_type = 'deny_all' then 1 end) as denied,
count(case when last_consent_event_type = 'expired' then 1 end) as expired,
count(case when last_consent_event_type = 'withdrawn' then 1 end) as withdrawn,
count(case when last_consent_event_type = 'implicit_consent' then 1 end) as implicit_consent,
count(case when {{ dateadd('year', '1', 'last_consent_event_tstamp') }} <= {{ dateadd('month', '6', 'current_date') }}
and last_consent_event_type <> 'expired'
and {{ dateadd('year', '1', 'last_consent_event_tstamp') }} > current_date then 1 end) as expires_in_six_months

from {{ ref('snowplow_web_consent_users') }}

where last_consent_event_type is not null

group by 1

)

select
v.*,
t.total_visitors,
t.allow_all,
t.allow_selected,
t.allow,
t.pending,
t.denied,
t.expired,
t.withdrawn,
t.implicit_consent,
t.expires_in_six_months

from {{ ref('snowplow_web_consent_versions') }} v

left join totals t
on t.last_consent_version = v.consent_version

order by v.version_start_tstamp desc

Depends On

models/optional_modules/consent/snowplow_web_consent_users.sql

Description

By user consent stats

Type: Table

Details

Columns
Column NameDescriptionType
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’text
cmp_eventsThe number of cmp_visible events the user has generatednumber
consent_eventsThe number of cosent events the user has generatednumber
last_cmp_event_tstampThe timestamp of the last cmp_visible eventtimestamp_ntz
last_consent_event_tstampThe timestamp of the last consent event after the cmp_visible event happenedtimestamp_ntz
last_consent_event_typeThe type of the last consent event after the cmp_visible event happenedtext
last_consent_scopesThe list of consent scopes in connection with the last consent eventtext
last_consent_versionThe privacy policy version in connection with the last consent eventtext
last_consent_urlThe privacy policy url in connection with the last consent eventtext
last_domains_appliedThe domains for which the last consent event appliestext
last_processed_eventThe timestamp of the last processed event needed for the incremental logictimestamp_ntz
is_latest_versionA boolean to filter whether the last consent or cmp visible event is sent after the latest privacy policy version goes liveboolean
Code
Source
{{
config(
materialized='incremental',
unique_key='domain_userid',
sort = 'last_consent_event_tstamp',
dist = 'domain_userid',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}


{% if is_incremental() %}
{%- set lower_limit, upper_limit = snowplow_utils.return_limits_from_model(this,
'last_processed_event',
'last_processed_event') %}
{% endif %}

with base as (

select
domain_userid,
user_id,
geo_country,
max(load_tstamp) as last_processed_event,
count(case when event_name = 'cmp_visible' then 1 end) as cmp_events,
count(case when event_name = 'consent_preferences' then 1 end) as consent_events,
max(case when event_name = 'cmp_visible' then derived_tstamp end) as last_cmp_event_tstamp,
row_number() over(partition by domain_userid order by max(load_tstamp) desc) as latest_event_by_user_rank

from {{ ref('snowplow_web_consent_log') }}

{% if is_incremental() %} -- and it has not been processed yet
where load_tstamp > {{ upper_limit }}
{% endif %}

group by 1,2,3

)

, latest_consents as (

select
domain_userid,
derived_tstamp as last_consent_event_tstamp,
event_type as last_consent_event_type,
consent_scopes as last_consent_scopes,
consent_version as last_consent_version,
consent_url as last_consent_url,
domains_applied as last_domains_applied,
row_number() over(partition by domain_userid order by load_tstamp desc) as latest_consent_event_by_user_rank

from {{ ref('snowplow_web_consent_log') }}

where event_name = 'consent_preferences'

{% if is_incremental() %} -- and it has not been processed yet
and load_tstamp > {{ upper_limit }}
{% endif %}

)

{% if is_incremental() %}

select
b.domain_userid,
b.user_id,
b.geo_country,
coalesce(b.cmp_events, 0) + coalesce(t.cmp_events, 0) as cmp_events,
coalesce(b.consent_events, 0) + coalesce(t.consent_events, 0) as consent_events,
b.last_cmp_event_tstamp,
l.last_consent_event_tstamp,
l.last_consent_event_type,
l.last_consent_scopes,
l.last_consent_version,
l.last_consent_url,
l.last_domains_applied,
b.last_processed_event,
case when v.is_latest_version then True else False end as is_latest_version

from base b

left join latest_consents l
on b.domain_userid = l.domain_userid

left join {{ ref('snowplow_web_consent_versions')}} v
on v.consent_version = l.last_consent_version

left join {{ this }} t
on t.domain_userid = b.domain_userid

where (l.latest_consent_event_by_user_rank = 1 or l.domain_userid is null)
and b.latest_event_by_user_rank = 1

{% else %}

select
b.domain_userid,
b.user_id,
b.geo_country,
b.cmp_events,
b.consent_events,
b.last_cmp_event_tstamp,
l.last_consent_event_tstamp,
l.last_consent_event_type,
l.last_consent_scopes,
l.last_consent_version,
l.last_consent_url,
l.last_domains_applied,
b.last_processed_event,
case when v.is_latest_version then True else False end as is_latest_version

from base b

left join latest_consents l
on b.domain_userid = l.domain_userid

left join {{ ref('snowplow_web_consent_versions') }} v
on v.consent_version = l.last_consent_version

where (l.latest_consent_event_by_user_rank = 1 or l.domain_userid is null)
and b.latest_event_by_user_rank = 1

{% endif %}

Depends On

Referenced By

models/optional_modules/consent/snowplow_web_consent_versions.sql

Description

Used to keep track of each consent version and its validity

Type: Table

Details

Columns
Column NameDescriptionType
consent_versionVersion of the privacy policy related documenttext
version_start_tstampThe time_stamp of the first allow_all event related to a consent versiontimestamp_ntz
consent_scopesThe scopes allowed after the user finalized his selection of consent preferences Eg ['analytics', 'functional', 'advertisement']text
consent_urlURI of the privacy policy related documenttext
domains_appliedThe domains for which this consent allows these preferences to persist totext
is_latest_versionA boolean to filter whether the last consent or cmp visible event is sent after the latest privacy policy version goes liveboolean
last_allow_all_eventThe timestamp of the last allow_all event used for the incremental updatetimestamp_ntz
Code
Source
{{
config(
materialized='incremental',
unique_key='consent_version',
sort = 'version_start_tstamp',
dist = 'consent_version',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}


{% if is_incremental() %}
{%- set lower_limit, upper_limit = snowplow_utils.return_limits_from_model(this,
'last_allow_all_event',
'last_allow_all_event') %}
{% endif %}

with consent_versions as (

select
consent_version,
consent_scopes,
consent_url,
domains_applied,
min(derived_tstamp) as version_start_tstamp,
max(load_tstamp) as last_allow_all_event

from {{ ref('snowplow_web_consent_log') }}

where event_name <> 'cmp_visible' and event_type = 'allow_all'

{% if is_incremental() %} -- and it has not been processed yet
and load_tstamp > {{ upper_limit }}
{% endif %}

group by 1,2,3,4
)

, latest_version as (

select
consent_version,
version_start_tstamp

from consent_versions

order by 2 desc limit 1
)

{% if is_incremental() %}

select
v.consent_version,
least(v.version_start_tstamp, t.version_start_tstamp) as version_start_tstamp,
v.consent_scopes,
v.consent_url,
v.domains_applied,
case when l.consent_version is not null then True else False end is_latest_version,
v.last_allow_all_event

from consent_versions v

left join latest_version l

on v.consent_version = l.consent_version

left join {{ this }} t
on t.consent_version = v.consent_version

{% else %}

select
v.consent_version,
v.version_start_tstamp,
v.consent_scopes,
v.consent_url,
v.domains_applied,
case when l.consent_version is not null then True else False end is_latest_version,
v.last_allow_all_event

from consent_versions v

left join latest_version l

on v.consent_version = l.consent_version

{% endif %}

Depends On

Referenced By

Snowplow Web Incremental Manifest

models/base/manifest/snowplow_web_incremental_manifest.sql

Description

This incremental table is a manifest of the timestamp of the latest event consumed per model within the snowplow-web package as well as any models leveraging the incremental framework provided by the package. The latest event's timestamp is based off collector_tstamp. This table is used to determine what events should be processed in the next run of the model.

Type: Table

Details

Columns
Column NameDescriptionType
modelThe name of the model.text
last_successThe latest event consumed by the model, based on collector_tstamptimestamp_ntz
Code
Source
{{
config(
materialized='incremental',
full_refresh=snowplow_web.allow_refresh(),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
}
)
}}

-- Boilerplate to generate table.
-- Table updated as part of end-run hook

with prep as (
select
cast(null as {{ snowplow_utils.type_max_string() }}) model,
cast('1970-01-01' as {{ type_timestamp() }}) as last_success
)

select *

from prep
where false

Depends On

Referenced By

Snowplow Web Page View Events

models/page_views/scratch/<adaptor>/snowplow_web_page_view_events.sql

Description

This is a staging table containing all the page view events for a given run of the Web model. It is the first step in the page views module and therefore does not contain metrics such as engaged time and scroll depth which are calculated in subsequent models. It is also where the de-duping of page_view_id's occurs

File Paths

models/page_views/scratch/default/snowplow_web_page_view_events.sql

Details

Columns
Column NameDescription
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
Code
Source
{{
config(
sort='start_tstamp',
dist='page_view_id'
)
}}

with page_view_events as (
select
ev.page_view_id,
ev.event_id,

ev.app_id,

-- user fields
ev.user_id,
ev.domain_userid,
ev.network_userid,

-- session fields
ev.domain_sessionid,
ev.domain_sessionidx,

-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.derived_tstamp as start_tstamp,

ev.doc_width,
ev.doc_height,

ev.page_title,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,

ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,

ev.page_referrer,
ev.refr_urlscheme ,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,

ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone ,

ev.user_ipaddress,

ev.useragent,

ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,
ev.os_timezone,

row_number() over (partition by ev.page_view_id order by ev.derived_tstamp, ev.dvce_created_tstamp) as page_view_id_dedupe_index

from {{ ref('snowplow_web_base_events_this_run') }} as ev

where ev.event_name = 'page_view'
and ev.page_view_id is not null

{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
)

select
pv.page_view_id,
pv.event_id,

pv.app_id,

-- user fields
pv.user_id,
pv.domain_userid,
pv.network_userid,

-- session fields
pv.domain_sessionid,
pv.domain_sessionidx,

-- timestamp fields
pv.dvce_created_tstamp,
pv.collector_tstamp,
pv.derived_tstamp,
pv.start_tstamp,

pv.doc_width,
pv.doc_height,

pv.page_title,
pv.page_url,
pv.page_urlscheme,
pv.page_urlhost,
pv.page_urlpath,
pv.page_urlquery,
pv.page_urlfragment,

pv.mkt_medium,
pv.mkt_source,
pv.mkt_term,
pv.mkt_content,
pv.mkt_campaign,
pv.mkt_clickid,
pv.mkt_network,

pv.page_referrer,
pv.refr_urlscheme ,
pv.refr_urlhost,
pv.refr_urlpath,
pv.refr_urlquery,
pv.refr_urlfragment,
pv.refr_medium,
pv.refr_source,
pv.refr_term,

pv.geo_country,
pv.geo_region,
pv.geo_region_name,
pv.geo_city,
pv.geo_zipcode,
pv.geo_latitude,
pv.geo_longitude,
pv.geo_timezone ,

pv.user_ipaddress,

pv.useragent,

pv.br_lang,
pv.br_viewwidth,
pv.br_viewheight,
pv.br_colordepth,
pv.br_renderengine,
pv.os_timezone,

row_number() over (partition by pv.domain_sessionid order by pv.derived_tstamp, pv.dvce_created_tstamp) as page_view_in_session_index --Moved to post dedupe, unlike V1 web model.

from page_view_events as pv

where page_view_id_dedupe_index = 1

Depends On

Snowplow Web Page Views

models/page_views/snowplow_web_page_views.sql

Description

This derived incremental table contains all historic page views and should be the end point for any analysis or BI tools.

Type: Table

Details

Columns
Column NameDescriptionType
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
event_idA UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
app_idApplication ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.text
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’text
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidxA visit / session index e.g. 3number
page_view_in_session_indexA page view index within a single sessionnumber
page_views_in_sessionDistinct count of page_view_id within a sessionnumber
dvce_created_tstampTimestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’timestamp_ntz
collector_tstampTime stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’timestamp_ntz
derived_tstampTimestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’timestamp_ntz
start_tstampTimestamp for the start of the page view, based on derived_tstamptimestamp_ntz
end_tstampTimestamp for the end of the page view, based on derived_tstamptimestamp_ntz
model_tstampThe current timestamp when the model processed this row.timestamp_ntz
engaged_time_in_sTime spent by the user on the page calculated using page pings.number
absolute_time_in_sThe time in seconds between the start_tstamp and end_tstampnumber
horizontal_pixels_scrolledDistance the user scrolled horizontally in pixelsnumber
vertical_pixels_scrolledDistance the user scrolled vertically in pixelsnumber
horizontal_percentage_scrolledPercentage of page scrolled horizontallyfloat
vertical_percentage_scrolledPercentage of page scrolled verticallyfloat
doc_widthThe page’s width in pixels e.g. 1024number
doc_heightThe page’s height in pixels e.g. 3000number
page_titleWeb page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’text
page_urlThe page URL e.g. ‘http://www.example.com’text
page_urlschemeScheme aka protocol e.g. ‘https’text
page_urlhostHost aka domain e.g. ‘“www.snowplow.io’text
page_urlpathPath to page e.g. ‘/product/index.html’text
page_urlqueryQuerystring e.g. ‘id=GTM-DLRG’text
page_urlfragmentFragment aka anchor e.g. ‘4-conclusion’text
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’text
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’text
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’text
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723text
mkt_campaignThe campaign ID e.g. ‘diageo-123’text
mkt_clickidThe click ID e.g. ‘ac3d8e459’text
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’text
page_referrerURL of the referrer e.g. ‘http://www.referrer.com’text
refr_urlschemeReferer scheme e.g. ‘http’text
refr_urlhostReferer host e.g. ‘www.bing.com’text
refr_urlpathReferer page path e.g. ‘/images/search’text
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’text
refr_urlfragmentReferer URL fragmenttext
refr_mediumType of referer e.g. ‘search’, ‘internal’text
refr_sourceName of referer if recognised e.g. ‘Bing images’text
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’text
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’text
geo_regionISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’text
geo_region_nameVisitor region name e.g. ‘Florida’text
geo_cityCity the visitor is in e.g. ‘New York’, ‘London’text
geo_zipcodePostcode the visitor is in e.g. ‘94109’text
geo_latitudeVisitor location latitude e.g. 37.443604float
geo_longitudeVisitor location longitude e.g. -122.4124float
geo_timezoneVisitor timezone name e.g. ‘Europe/London’text
user_ipaddressUser IP address e.g. ‘92.231.54.234’text
useragentRaw useragenttext
br_langLanguage the browser is set to e.g. ‘en-GB’text
br_viewwidthViewport width e.g. 1000number
br_viewheightViewport height e.g. 1000number
br_colordepthBit depth of the browser color palette e.g. 24text
br_renderengineBrowser rendering engine e.g. ‘GECKO’text
os_timezoneClient operating system timezone e.g. ‘Europe/London’text
categoryCategory based on activity if the IP/UA is a spider or robot, BROWSER otherwisetext
primary_impactWhether the spider or robot would affect page impression measurement, ad impression measurement, both or nonetext
reasonType of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwisetext
spider_or_robotTrue if the IP address or user agent checked against the list is a spider or robot, false otherwiseboolean
useragent_familyUseragent family (browser) nametext
useragent_majorUseragent major versiontext
useragent_minorUseragent minor versiontext
useragent_patchUseragent patch versiontext
useragent_versionFull version of the useragenttext
os_familyOperating system family e.g. ‘Linux’text
os_majorOperation system major versiontext
os_minorOperation system minor versiontext
os_patchOperation system patch versiontext
os_patch_minorOperation system patch minor versiontext
os_versionOperation system full versiontext
device_familyDevice typetext
device_classClass of device e.g. phonetext
agent_classClass of agent e.g. browsertext
agent_nameName of agent e.g. Chrometext
agent_name_versionName and version of agent e.g. Chrome 53.0.2785.124text
agent_name_version_majorName and major version of agent e.g. Chrome 53text
agent_versionVersion of agent e.g. 53.0.2785.124text
agent_version_majorMajor version of agent e.g. 53text
device_brandBrand of device e.g. Googletext
device_nameName of device e.g. Google Nexus 6text
device_versionVersion of device e.g. 6.0text
layout_engine_classClass of layout engine e.g. Browsertext
layout_engine_nameName of layout engine e.g. Blinktext
layout_engine_name_versionName and version of layout engine e.g. Blink 53.0text
layout_engine_name_version_majorName and major version of layout engine e.g. Blink 53text
layout_engine_versionVersion of layout engine e.g. 53.0text
layout_engine_version_majorMajor version of layout engine e.g. 53text
operating_system_classClass of the OS e.g. Mobiletext
operating_system_nameName of the OS e.g. Androidtext
operating_system_name_versionName and version of the OS e.g. Android 7.0text
operating_system_versionVersion of the OS e.g. 7.0text
Code
Source
{{
config(
materialized='incremental',
unique_key='page_view_id',
upsert_date_key='start_tstamp',
sort='start_tstamp',
dist='page_view_id',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val = {
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_page_views(),
tags=["derived"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}


select *
{% if target.type in ['databricks', 'spark'] -%}
, DATE(start_tstamp) as start_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_page_views_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.

Depends On

Snowplow Web Page Views This Run

models/page_views/scratch/<adaptor>/snowplow_web_page_views_this_run.sql

Description

This staging table contains all the page views for the given run of the Web model. It possess all the same columns as snowplow_web_page_views. If building a custom module that requires page view events, this is the table you should reference.

Type: Table

File Paths

models/page_views/scratch/default/snowplow_web_page_views_this_run.sql

Details

Columns
Column NameDescriptionType
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
event_idA UUID for each event e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
app_idApplication ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.text
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’text
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidxA visit / session index e.g. 3number
page_view_in_session_indexA page view index within a single sessionnumber
page_views_in_sessionDistinct count of page_view_id within a sessionnumber
dvce_created_tstampTimestamp event was recorded on the client device e.g. ‘2013-11-26 00:03:57.885’timestamp_ntz
collector_tstampTime stamp for the event recorded by the collector e.g. ‘2013-11-26 00:02:05’timestamp_ntz
derived_tstampTimestamp making allowance for innaccurate device clock e.g. ‘2013-11-26 00:02:04’timestamp_ntz
start_tstampTimestamp for the start of the page view, based on derived_tstamptimestamp_ntz
end_tstampTimestamp for the end of the page view, based on derived_tstamptimestamp_ntz
model_tstampThe current timestamp when the model processed this row.timestamp_ntz
engaged_time_in_sTime spent by the user on the page calculated using page pings.number
absolute_time_in_sThe time in seconds between the start_tstamp and end_tstampnumber
horizontal_pixels_scrolledDistance the user scrolled horizontally in pixelsnumber
vertical_pixels_scrolledDistance the user scrolled vertically in pixelsnumber
horizontal_percentage_scrolledPercentage of page scrolled horizontallyfloat
vertical_percentage_scrolledPercentage of page scrolled verticallyfloat
doc_widthThe page’s width in pixels e.g. 1024number
doc_heightThe page’s height in pixels e.g. 3000number
page_titleWeb page title e.g. ‘Snowplow Docs – Understanding the structure of Snowplow data’text
page_urlThe page URL e.g. ‘http://www.example.com’text
page_urlschemeScheme aka protocol e.g. ‘https’text
page_urlhostHost aka domain e.g. ‘“www.snowplow.io’text
page_urlpathPath to page e.g. ‘/product/index.html’text
page_urlqueryQuerystring e.g. ‘id=GTM-DLRG’text
page_urlfragmentFragment aka anchor e.g. ‘4-conclusion’text
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’text
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’text
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’text
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723text
mkt_campaignThe campaign ID e.g. ‘diageo-123’text
mkt_clickidThe click ID e.g. ‘ac3d8e459’text
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’text
page_referrerURL of the referrer e.g. ‘http://www.referrer.com’text
refr_urlschemeReferer scheme e.g. ‘http’text
refr_urlhostReferer host e.g. ‘www.bing.com’text
refr_urlpathReferer page path e.g. ‘/images/search’text
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’text
refr_urlfragmentReferer URL fragmenttext
refr_mediumType of referer e.g. ‘search’, ‘internal’text
refr_sourceName of referer if recognised e.g. ‘Bing images’text
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’text
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’text
geo_regionISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’text
geo_region_nameVisitor region name e.g. ‘Florida’text
geo_cityCity the visitor is in e.g. ‘New York’, ‘London’text
geo_zipcodePostcode the visitor is in e.g. ‘94109’text
geo_latitudeVisitor location latitude e.g. 37.443604float
geo_longitudeVisitor location longitude e.g. -122.4124float
geo_timezoneVisitor timezone name e.g. ‘Europe/London’text
user_ipaddressUser IP address e.g. ‘92.231.54.234’text
useragentRaw useragenttext
br_langLanguage the browser is set to e.g. ‘en-GB’text
br_viewwidthViewport width e.g. 1000number
br_viewheightViewport height e.g. 1000number
br_colordepthBit depth of the browser color palette e.g. 24text
br_renderengineBrowser rendering engine e.g. ‘GECKO’text
os_timezoneClient operating system timezone e.g. ‘Europe/London’text
categoryCategory based on activity if the IP/UA is a spider or robot, BROWSER otherwisetext
primary_impactWhether the spider or robot would affect page impression measurement, ad impression measurement, both or nonetext
reasonType of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwisetext
spider_or_robotTrue if the IP address or user agent checked against the list is a spider or robot, false otherwiseboolean
useragent_familyUseragent family (browser) nametext
useragent_majorUseragent major versiontext
useragent_minorUseragent minor versiontext
useragent_patchUseragent patch versiontext
useragent_versionFull version of the useragenttext
os_familyOperating system family e.g. ‘Linux’text
os_majorOperation system major versiontext
os_minorOperation system minor versiontext
os_patchOperation system patch versiontext
os_patch_minorOperation system patch minor versiontext
os_versionOperation system full versiontext
device_familyDevice typetext
device_classClass of device e.g. phonetext
agent_classClass of agent e.g. browsertext
agent_nameName of agent e.g. Chrometext
agent_name_versionName and version of agent e.g. Chrome 53.0.2785.124text
agent_name_version_majorName and major version of agent e.g. Chrome 53text
agent_versionVersion of agent e.g. 53.0.2785.124text
agent_version_majorMajor version of agent e.g. 53text
device_brandBrand of device e.g. Googletext
device_nameName of device e.g. Google Nexus 6text
device_versionVersion of device e.g. 6.0text
layout_engine_classClass of layout engine e.g. Browsertext
layout_engine_nameName of layout engine e.g. Blinktext
layout_engine_name_versionName and version of layout engine e.g. Blink 53.0text
layout_engine_name_version_majorName and major version of layout engine e.g. Blink 53text
layout_engine_versionVersion of layout engine e.g. 53.0text
layout_engine_version_majorMajor version of layout engine e.g. 53text
operating_system_classClass of the OS e.g. Mobiletext
operating_system_nameName of the OS e.g. Androidtext
operating_system_name_versionName and version of the OS e.g. Android 7.0text
operating_system_versionVersion of the OS e.g. 7.0text
Code
Source
{{
config(
tags=["this_run"]
)
}}


select
ev.page_view_id,
ev.event_id,

ev.app_id,

-- user fields
ev.user_id,
ev.domain_userid,
ev.network_userid,

-- session fields
ev.domain_sessionid,
ev.domain_sessionidx,

ev.page_view_in_session_index,
max(ev.page_view_in_session_index) over (partition by ev.domain_sessionid) as page_views_in_session,

-- timestamp fields
ev.dvce_created_tstamp,
ev.collector_tstamp,
ev.derived_tstamp,
ev.start_tstamp,
coalesce(t.end_tstamp, ev.derived_tstamp) as end_tstamp, -- only page views with pings will have a row in table t
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,

coalesce(t.engaged_time_in_s, 0) as engaged_time_in_s, -- where there are no pings, engaged time is 0.
{{ datediff('ev.derived_tstamp', 'coalesce(t.end_tstamp, ev.derived_tstamp)', 'second') }} as absolute_time_in_s,

sd.hmax as horizontal_pixels_scrolled,
sd.vmax as vertical_pixels_scrolled,

sd.relative_hmax as horizontal_percentage_scrolled,
sd.relative_vmax as vertical_percentage_scrolled,

ev.doc_width,
ev.doc_height,

ev.page_title,
ev.page_url,
ev.page_urlscheme,
ev.page_urlhost,
ev.page_urlpath,
ev.page_urlquery,
ev.page_urlfragment,

ev.mkt_medium,
ev.mkt_source,
ev.mkt_term,
ev.mkt_content,
ev.mkt_campaign,
ev.mkt_clickid,
ev.mkt_network,

ev.page_referrer,
ev.refr_urlscheme,
ev.refr_urlhost,
ev.refr_urlpath,
ev.refr_urlquery,
ev.refr_urlfragment,
ev.refr_medium,
ev.refr_source,
ev.refr_term,

ev.geo_country,
ev.geo_region,
ev.geo_region_name,
ev.geo_city,
ev.geo_zipcode,
ev.geo_latitude,
ev.geo_longitude,
ev.geo_timezone,

ev.user_ipaddress,

ev.useragent,

ev.br_lang,
ev.br_viewwidth,
ev.br_viewheight,
ev.br_colordepth,
ev.br_renderengine,

ev.os_timezone,

-- optional fields, only populated if enabled.

-- iab enrichment fields: set iab variable to true to enable
{{snowplow_web.get_iab_context_fields('iab')}},

-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields('ua')}},

-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields('ya')}}

from {{ ref('snowplow_web_page_view_events') }} ev

left join {{ ref('snowplow_web_pv_engaged_time') }} t
on ev.page_view_id = t.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and ev.domain_sessionid = t.domain_sessionid {% endif %}

left join {{ ref('snowplow_web_pv_scroll_depth') }} sd
on ev.page_view_id = sd.page_view_id {% if var('snowplow__limit_page_views_to_session', true) %} and ev.domain_sessionid = sd.domain_sessionid {% endif %}

{% if var('snowplow__enable_iab', false) -%}

left join {{ ref('snowplow_web_pv_iab') }} iab
on ev.page_view_id = iab.page_view_id

{% endif -%}

{% if var('snowplow__enable_ua', false) -%}

left join {{ ref('snowplow_web_pv_ua_parser') }} ua
on ev.page_view_id = ua.page_view_id

{% endif -%}

{% if var('snowplow__enable_yauaa', false) -%}

left join {{ ref('snowplow_web_pv_yauaa') }} ya
on ev.page_view_id = ya.page_view_id

{%- endif -%}

Depends On

Referenced By

Snowplow Web Pv Engaged Time

models/page_views/scratch/snowplow_web_pv_engaged_time.sql

Description

This model calculates the time a visitor spent engaged on a given page view. This is calculated using the number of page ping events received for that page view.

Type: Table

Details

Columns
Column NameDescriptionType
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidtext
end_tstamptimestamp_ntz
engaged_time_in_snumber
Code
Source
{{
config(
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

select
ev.page_view_id,
{% if var('snowplow__limit_page_views_to_session', true) %}
ev.domain_sessionid,
{% endif %}
max(ev.derived_tstamp) as end_tstamp,

-- aggregate pings:
-- divides epoch tstamps by snowplow__heartbeat to get distinct intervals
-- floor rounds to nearest integer - duplicates all evaluate to the same number
-- count(distinct) counts duplicates only once
-- adding snowplow__min_visit_length accounts for the page view event itself.

{{ var("snowplow__heartbeat", 10) }} * (count(distinct(floor({{ snowplow_utils.to_unixtstamp('ev.dvce_created_tstamp') }}/{{ var("snowplow__heartbeat", 10) }}))) - 1) + {{ var("snowplow__min_visit_length", 5) }} as engaged_time_in_s

from {{ ref('snowplow_web_base_events_this_run') }} as ev

where ev.event_name = 'page_ping'
and ev.page_view_id is not null

group by 1 {% if var('snowplow__limit_page_views_to_session', true) %}, 2 {% endif %}

Depends On

Referenced By

Snowplow Web Pv Iab

models/page_views/scratch/<adaptor>/snowplow_web_pv_iab.sql

Description

Redshift and Postgres only. This is a staging table containing context data generated by the IAB enrichment for the events in the given run. The model is disable by default. Refer to the docs to enable.

The IAB Spiders & Robots enrichment uses the IAB/ABC International Spiders and Bots List to determine whether an event was produced by a user or a robot/spider based on its’ IP address and user agent.

File Paths

models/page_views/scratch/default/snowplow_web_pv_iab.sql

Details

Columns
Column NameDescription
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
Code
Source
{{
config(
enabled=(var('snowplow__enable_iab', false) and target.type in ['redshift', 'postgres'] | as_bool())
)
}}

with base as (
select
pv.page_view_id,

iab.category,
iab.primary_impact,
iab.reason,
iab.spider_or_robot,
row_number() over (partition by pv.page_view_id order by pv.collector_tstamp) as dedupe_index

from {{ var('snowplow__iab_context') }} iab

inner join {{ ref('snowplow_web_page_view_events') }} pv
on iab.root_id = pv.event_id
and iab.root_tstamp = pv.collector_tstamp

where iab.root_tstamp >= (select lower_limit from {{ ref('snowplow_web_pv_limits') }})
and iab.root_tstamp <= (select upper_limit from {{ ref('snowplow_web_pv_limits') }})

)

select *

from base

where dedupe_index = 1

Snowplow Web Pv Limits

models/page_views/scratch/<adaptor>/snowplow_web_pv_limits.sql

Description

This model calculates the lower and upper limit for the page views events in the given run. This is based taking the min and max collector_tstamp across all page views. It is used to improve performance when selected rows from the various context tables such as the UA parser table.

File Paths

models/page_views/scratch/default/snowplow_web_pv_limits.sql

Details

Code
Source
select
min(collector_tstamp) as lower_limit,
max(collector_tstamp) as upper_limit

from {{ ref('snowplow_web_base_events_this_run') }}

where page_view_id is not null

Snowplow Web Pv Scroll Depth

models/page_views/scratch/snowplow_web_pv_scroll_depth.sql

Description

This model calculates the horizontal and vertical scroll depth of the visitor on a given page view. Such metrics are useful when assessing engagement on a page view.

Type: Table

Details

Columns
Column NameDescriptionType
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidtext
doc_widthnumber
doc_heightnumber
br_viewwidthnumber
br_viewheightnumber
hminnumber
hmaxnumber
vminnumber
vmaxnumber
relative_hminfloat
relative_hmaxfloat
relative_vminfloat
relative_vmaxfloat
Code
Source
{{
config(
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

with prep as (
select
ev.page_view_id,
{% if var('snowplow__limit_page_views_to_session', true) %}
ev.domain_sessionid,
{% endif %}

max(ev.doc_width) as doc_width,
max(ev.doc_height) as doc_height,

max(ev.br_viewwidth) as br_viewwidth,
max(ev.br_viewheight) as br_viewheight,

-- coalesce replaces null with 0 (because the page view event does send an offset)
-- greatest prevents outliers (negative offsets)
-- least also prevents outliers (offsets greater than the docwidth or docheight)

least(greatest(min(coalesce(ev.pp_xoffset_min, 0)), 0), max(ev.doc_width)) as hmin, -- should be zero
least(greatest(max(coalesce(ev.pp_xoffset_max, 0)), 0), max(ev.doc_width)) as hmax,

least(greatest(min(coalesce(ev.pp_yoffset_min, 0)), 0), max(ev.doc_height)) as vmin, -- should be zero (edge case: not zero because the pv event is missing)
least(greatest(max(coalesce(ev.pp_yoffset_max, 0)), 0), max(ev.doc_height)) as vmax

from {{ ref('snowplow_web_base_events_this_run') }} as ev

where ev.event_name in ('page_view', 'page_ping')
and ev.page_view_id is not null
and ev.doc_height > 0 -- exclude problematic (but rare) edge case
and ev.doc_width > 0 -- exclude problematic (but rare) edge case

group by 1 {% if var('snowplow__limit_page_views_to_session', true) %}, 2 {% endif %}
)

select
page_view_id,
{% if var('snowplow__limit_page_views_to_session', true) %}
domain_sessionid,
{% endif %}

doc_width,
doc_height,

br_viewwidth,
br_viewheight,

hmin,
hmax,
vmin,
vmax,

cast(round(100*(greatest(hmin, 0)/cast(doc_width as {{ type_float() }}))) as {{ type_float() }}) as relative_hmin, -- brackets matter: because hmin is of type int, we need to divide before we multiply by 100 or we risk an overflow
cast(round(100*(least(hmax + br_viewwidth, doc_width)/cast(doc_width as {{ type_float() }}))) as {{ type_float() }}) as relative_hmax,
cast(round(100*(greatest(vmin, 0)/cast(doc_height as {{ type_float() }}))) as {{ type_float() }}) as relative_vmin,
cast(round(100*(least(vmax + br_viewheight, doc_height)/cast(doc_height as {{ type_float() }}))) as {{ type_float() }}) as relative_vmax -- not zero when a user hasn't scrolled because it includes the non-zero viewheight

from prep

Depends On

Referenced By

Snowplow Web Pv Ua Parser

models/page_views/scratch/<adaptor>/snowplow_web_pv_ua_parser.sql

Description

Redshift and Postgres only. This is a staging table containing context data generated by the UA parser enrichment for the events in the given run. The model is disable by default. Refer to the docs to enable.

File Paths

models/page_views/scratch/default/snowplow_web_pv_ua_parser.sql

Details

Columns
Column NameDescription
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
Code
Source
{{
config(
enabled=(var('snowplow__enable_ua', false) and target.type in ['redshift', 'postgres'] | as_bool())
)
}}

with base as (
select
pv.page_view_id,

ua.useragent_family,
ua.useragent_major,
ua.useragent_minor,
ua.useragent_patch,
ua.useragent_version,
ua.os_family,
ua.os_major,
ua.os_minor,
ua.os_patch,
ua.os_patch_minor,
ua.os_version,
ua.device_family,
row_number() over (partition by pv.page_view_id order by pv.collector_tstamp) as dedupe_index

from {{ var('snowplow__ua_parser_context') }} as ua

inner join {{ ref('snowplow_web_page_view_events') }} pv
on ua.root_id = pv.event_id
and ua.root_tstamp = pv.collector_tstamp

where ua.root_tstamp >= (select lower_limit from {{ ref('snowplow_web_pv_limits') }})
and ua.root_tstamp <= (select upper_limit from {{ ref('snowplow_web_pv_limits') }})

)

select *

from base

where dedupe_index = 1

Snowplow Web Pv Yauaa

models/page_views/scratch/<adaptor>/snowplow_web_pv_yauaa.sql

Description

Redshift and Postgres only. This is a staging table containing context data generated by the YAUAA enrichment. The model is disable by default. Refer to the docs to enable.

File Paths

models/page_views/scratch/default/snowplow_web_pv_yauaa.sql

Details

Columns
Column NameDescription
page_view_idA UUID for each page view e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’
Code
Source
{{
config(
enabled=(var('snowplow__enable_yauaa', false) and target.type in ['redshift', 'postgres'] | as_bool())
)
}}


with base as (
select
pv.page_view_id,

ya.device_class,
ya.agent_class,
ya.agent_name,
ya.agent_name_version,
ya.agent_name_version_major,
ya.agent_version,
ya.agent_version_major,
ya.device_brand,
ya.device_name,
ya.device_version,
ya.layout_engine_class,
ya.layout_engine_name,
ya.layout_engine_name_version,
ya.layout_engine_name_version_major,
ya.layout_engine_version,
ya.layout_engine_version_major,
ya.operating_system_class,
ya.operating_system_name,
ya.operating_system_name_version,
ya.operating_system_version,
row_number() over (partition by pv.page_view_id order by pv.collector_tstamp) as dedupe_index

from {{ var('snowplow__yauaa_context') }} ya

inner join {{ ref('snowplow_web_page_view_events') }} pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp

where ya.root_tstamp >= (select lower_limit from {{ ref('snowplow_web_pv_limits') }})
and ya.root_tstamp <= (select upper_limit from {{ ref('snowplow_web_pv_limits') }})

)

select *

from base

where dedupe_index = 1

Snowplow Web Sessions

models/sessions/snowplow_web_sessions.sql

Description

This derived incremental table contains all historic sessions and should be the end point for any analysis or BI tools.

Type: Table

Details

Columns
Column NameDescriptionType
app_idApplication ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.text
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidxA visit / session index e.g. 3number
start_tstampTimestamp for the start of the session, based on derived_tstamptimestamp_ntz
end_tstampTimestamp for the end of the session, based on derived_tstamptimestamp_ntz
model_tstampThe current timestamp when the model processed this row.timestamp_ntz
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
stitched_user_idtext
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’text
page_viewsThe number of distinct page views within a sessionnumber
engaged_time_in_sThe total time engaged by a user within a sessionnumber
absolute_time_in_sThe time in seconds between the start_tstamp and end_tstampnumber
first_page_titleThe title of the first page visited within the sessiontext
first_page_urlThe url of the first page visited within the sessiontext
first_page_urlschemeThe urlscheme of the first page visited within the sessiontext
first_page_urlhostThe urlhost of the first page visited within the sessiontext
first_page_urlpathThe urlpath of the first page visited within the sessiontext
first_page_urlqueryThe urlquery of the first page visited within the sessiontext
first_page_urlfragmentThe urlfragment of the first page visited within the sessiontext
last_page_titleThe title of the last page visited within the sessiontext
last_page_urlThe url of the last page visited within the sessiontext
last_page_urlschemeThe urlscheme of the last page visited within the sessiontext
last_page_urlhostThe urlhost of the last page visited within the sessiontext
last_page_urlpathThe urlpath of the last page visited within the sessiontext
last_page_urlqueryThe urlquery of the last page visited within the sessiontext
last_page_urlfragmentThe urlfragment of the last page visited within the sessiontext
referrerThe referrer associated with the first page view of the sessiontext
refr_urlschemeReferer scheme e.g. ‘http’text
refr_urlhostReferer host e.g. ‘www.bing.com’text
refr_urlpathReferer page path e.g. ‘/images/search’text
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’text
refr_urlfragmentReferer URL fragmenttext
refr_mediumType of referer e.g. ‘search’, ‘internal’text
refr_sourceName of referer if recognised e.g. ‘Bing images’text
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’text
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’text
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’text
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’text
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723text
mkt_campaignThe campaign ID e.g. ‘diageo-123’text
mkt_clickidThe click ID e.g. ‘ac3d8e459’text
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’text
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’text
geo_regionISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’text
geo_region_nameVisitor region name e.g. ‘Florida’text
geo_cityCity the visitor is in e.g. ‘New York’, ‘London’text
geo_zipcodePostcode the visitor is in e.g. ‘94109’text
geo_latitudeVisitor location latitude e.g. 37.443604float
geo_longitudeVisitor location longitude e.g. -122.4124float
geo_timezoneVisitor timezone name e.g. ‘Europe/London’text
user_ipaddressUser IP address e.g. ‘92.231.54.234’text
useragentRaw useragenttext
br_renderengineBrowser rendering engine e.g. ‘GECKO’text
br_langLanguage the browser is set to e.g. ‘en-GB’text
os_timezoneClient operating system timezone e.g. ‘Europe/London’text
categoryCategory based on activity if the IP/UA is a spider or robot, BROWSER otherwisetext
primary_impactWhether the spider or robot would affect page impression measurement, ad impression measurement, both or nonetext
reasonType of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwisetext
spider_or_robotTrue if the IP address or user agent checked against the list is a spider or robot, false otherwiseboolean
useragent_familyUseragent family (browser) nametext
useragent_majorUseragent major versiontext
useragent_minorUseragent minor versiontext
useragent_patchUseragent patch versiontext
useragent_versionFull version of the useragenttext
os_familyOperating system family e.g. ‘Linux’text
os_majorOperation system major versiontext
os_minorOperation system minor versiontext
os_patchOperation system patch versiontext
os_patch_minorOperation system patch minor versiontext
os_versionOperation system full versiontext
device_familyDevice typetext
device_classClass of device e.g. phonetext
agent_classClass of agent e.g. browsertext
agent_nameName of agent e.g. Chrometext
agent_name_versionName and version of agent e.g. Chrome 53.0.2785.124text
agent_name_version_majorName and major version of agent e.g. Chrome 53text
agent_versionVersion of agent e.g. 53.0.2785.124text
agent_version_majorMajor version of agent e.g. 53text
device_brandBrand of device e.g. Googletext
device_nameName of device e.g. Google Nexus 6text
device_versionVersion of device e.g. 6.0text
layout_engine_classClass of layout engine e.g. Browsertext
layout_engine_nameName of layout engine e.g. Blinktext
layout_engine_name_versionName and version of layout engine e.g. Blink 53.0text
layout_engine_name_version_majorName and major version of layout engine e.g. Blink 53text
layout_engine_versionVersion of layout engine e.g. 53.0text
layout_engine_version_majorMajor version of layout engine e.g. 53text
operating_system_classClass of the OS e.g. Mobiletext
operating_system_nameName of the OS e.g. Androidtext
operating_system_name_versionName and version of the OS e.g. Android 7.0text
operating_system_versionVersion of the OS e.g. 7.0text
Code
Source
{{
config(
materialized='incremental',
unique_key='domain_sessionid',
upsert_date_key='start_tstamp',
sort='start_tstamp',
dist='domain_sessionid',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_sessions(),
tags=["derived"],
post_hook="{{ snowplow_web.stitch_user_identifiers(
enabled=var('snowplow__session_stitching')
) }}",
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}


select *
{% if target.type in ['databricks', 'spark'] -%}
, DATE(start_tstamp) as start_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_sessions_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.

Depends On

Referenced By

Snowplow Web Sessions This Run

models/sessions/scratch/<adaptor>/snowplow_web_sessions_this_run.sql

Description

This staging table contains all the sessions for the given run of the Web model. It possess all the same columns as snowplow_web_sessions. If building a custom module that requires session level data, this is the table you should reference.

Type: Table

File Paths

models/sessions/scratch/default/snowplow_web_sessions_this_run.sql

Details

Columns
Column NameDescriptionType
app_idApplication ID e.g. ‘angry-birds’ is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.text
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidxA visit / session index e.g. 3number
start_tstampTimestamp for the start of the session, based on derived_tstamptimestamp_ntz
end_tstampTimestamp for the end of the session, based on derived_tstamptimestamp_ntz
model_tstampThe current timestamp when the model processed this row.timestamp_ntz
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
stitched_user_idtext
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’text
page_viewsThe number of distinct page views within a sessionnumber
engaged_time_in_sThe total time engaged by a user within a sessionnumber
absolute_time_in_sThe time in seconds between the start_tstamp and end_tstampnumber
first_page_titleThe title of the first page visited within the sessiontext
first_page_urlThe url of the first page visited within the sessiontext
first_page_urlschemeThe urlscheme of the first page visited within the sessiontext
first_page_urlhostThe urlhost of the first page visited within the sessiontext
first_page_urlpathThe urlpath of the first page visited within the sessiontext
first_page_urlqueryThe urlquery of the first page visited within the sessiontext
first_page_urlfragmentThe urlfragment of the first page visited within the sessiontext
last_page_titleThe title of the last page visited within the sessiontext
last_page_urlThe url of the last page visited within the sessiontext
last_page_urlschemeThe urlscheme of the last page visited within the sessiontext
last_page_urlhostThe urlhost of the last page visited within the sessiontext
last_page_urlpathThe urlpath of the last page visited within the sessiontext
last_page_urlqueryThe urlquery of the last page visited within the sessiontext
last_page_urlfragmentThe urlfragment of the last page visited within the sessiontext
referrerThe referrer associated with the first page view of the sessiontext
refr_urlschemeReferer scheme e.g. ‘http’text
refr_urlhostReferer host e.g. ‘www.bing.com’text
refr_urlpathReferer page path e.g. ‘/images/search’text
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’text
refr_urlfragmentReferer URL fragmenttext
refr_mediumType of referer e.g. ‘search’, ‘internal’text
refr_sourceName of referer if recognised e.g. ‘Bing images’text
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’text
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’text
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’text
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’text
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723text
mkt_campaignThe campaign ID e.g. ‘diageo-123’text
mkt_clickidThe click ID e.g. ‘ac3d8e459’text
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’text
geo_countryISO 3166-1 code for the country the visitor is located in e.g. ‘GB’, ‘US’text
geo_regionISO-3166-2 code for country region the visitor is in e.g. ‘I9’, ‘TX’text
geo_region_nameVisitor region name e.g. ‘Florida’text
geo_cityCity the visitor is in e.g. ‘New York’, ‘London’text
geo_zipcodePostcode the visitor is in e.g. ‘94109’text
geo_latitudeVisitor location latitude e.g. 37.443604float
geo_longitudeVisitor location longitude e.g. -122.4124float
geo_timezoneVisitor timezone name e.g. ‘Europe/London’text
user_ipaddressUser IP address e.g. ‘92.231.54.234’text
useragentRaw useragenttext
br_renderengineBrowser rendering engine e.g. ‘GECKO’text
br_langLanguage the browser is set to e.g. ‘en-GB’text
os_timezoneClient operating system timezone e.g. ‘Europe/London’text
categoryCategory based on activity if the IP/UA is a spider or robot, BROWSER otherwisetext
primary_impactWhether the spider or robot would affect page impression measurement, ad impression measurement, both or nonetext
reasonType of failed check if the IP/UA is a spider or robot, PASSED_ALL otherwisetext
spider_or_robotTrue if the IP address or user agent checked against the list is a spider or robot, false otherwiseboolean
useragent_familyUseragent family (browser) nametext
useragent_majorUseragent major versiontext
useragent_minorUseragent minor versiontext
useragent_patchUseragent patch versiontext
useragent_versionFull version of the useragenttext
os_familyOperating system family e.g. ‘Linux’text
os_majorOperation system major versiontext
os_minorOperation system minor versiontext
os_patchOperation system patch versiontext
os_patch_minorOperation system patch minor versiontext
os_versionOperation system full versiontext
device_familyDevice typetext
device_classClass of device e.g. phonetext
agent_classClass of agent e.g. browsertext
agent_nameName of agent e.g. Chrometext
agent_name_versionName and version of agent e.g. Chrome 53.0.2785.124text
agent_name_version_majorName and major version of agent e.g. Chrome 53text
agent_versionVersion of agent e.g. 53.0.2785.124text
agent_version_majorMajor version of agent e.g. 53text
device_brandBrand of device e.g. Googletext
device_nameName of device e.g. Google Nexus 6text
device_versionVersion of device e.g. 6.0text
layout_engine_classClass of layout engine e.g. Browsertext
layout_engine_nameName of layout engine e.g. Blinktext
layout_engine_name_versionName and version of layout engine e.g. Blink 53.0text
layout_engine_name_version_majorName and major version of layout engine e.g. Blink 53text
layout_engine_versionVersion of layout engine e.g. 53.0text
layout_engine_version_majorMajor version of layout engine e.g. 53text
operating_system_classClass of the OS e.g. Mobiletext
operating_system_nameName of the OS e.g. Androidtext
operating_system_name_versionName and version of the OS e.g. Android 7.0text
operating_system_versionVersion of the OS e.g. 7.0text
Code
Source
{{
config(
tags=["this_run"]
)
}}

with session_firsts as (
select
-- app id
app_id as app_id,

-- session fields
domain_sessionid,
domain_sessionidx,

{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,

-- user fields
user_id,
domain_userid,
{% if var('snowplow__session_stitching') %}
-- updated with mapping as part of post hook on derived sessions table
cast(domain_userid as {{ type_string() }}) as stitched_user_id,
{% else %}
cast(null as {{ type_string() }}) as stitched_user_id,
{% endif %}
network_userid as network_userid,

-- first page fields
page_title as first_page_title,
page_url as first_page_url,
page_urlscheme as first_page_urlscheme,
page_urlhost as first_page_urlhost,
page_urlpath as first_page_urlpath,
page_urlquery as first_page_urlquery,
page_urlfragment as first_page_urlfragment,

-- referrer fields
page_referrer as referrer,
refr_urlscheme as refr_urlscheme,
refr_urlhost as refr_urlhost,
refr_urlpath as refr_urlpath,
refr_urlquery as refr_urlquery,
refr_urlfragment as refr_urlfragment,
refr_medium as refr_medium,
refr_source as refr_source,
refr_term as refr_term,

-- marketing fields
mkt_medium as mkt_medium,
mkt_source as mkt_source,
mkt_term as mkt_term,
mkt_content as mkt_content,
mkt_campaign as mkt_campaign,
mkt_clickid as mkt_clickid,
mkt_network as mkt_network,

-- Most if not all the following fields should be the same across all events in a session, but this ensures they are
-- geo fields
geo_country as geo_country,
geo_region as geo_region,
geo_region_name as geo_region_name,
geo_city as geo_city,
geo_zipcode as geo_zipcode,
geo_latitude as geo_latitude,
geo_longitude as geo_longitude,
geo_timezone as geo_timezone,

-- ip address
user_ipaddress as user_ipaddress,

-- user agent
useragent as useragent,

br_renderengine as br_renderengine,
br_lang as br_lang,
os_timezone as os_timezone,

-- optional fields, only populated if enabled.
-- iab enrichment fields: set iab variable to true to enable
{{snowplow_web.get_iab_context_fields('iab')}},

-- ua parser enrichment fields
{{snowplow_web.get_ua_context_fields('ua')}},

-- yauaa enrichment fields
{{snowplow_web.get_yauaa_context_fields('ya')}},

-- the joined tables should all be unique on page_view_id anyway, but this has the benefit of de-duping just in case
row_number() over (partition by ev.domain_sessionid order by ev.derived_tstamp, ev.dvce_created_tstamp) AS page_event_in_session_index,
event_name
from {{ ref('snowplow_web_base_events_this_run') }} ev

{% if var('snowplow__enable_iab', false) -%}
left join {{ ref('snowplow_web_pv_iab') }} iab
on ev.page_view_id = iab.page_view_id
{% endif -%}

{% if var('snowplow__enable_ua', false) -%}
left join {{ ref('snowplow_web_pv_ua_parser') }} ua
on ev.page_view_id = ua.page_view_id
{% endif -%}

{% if var('snowplow__enable_yauaa', false) -%}
left join {{ ref('snowplow_web_pv_yauaa') }} ya
on ev.page_view_id = ya.page_view_id
{% endif -%}

where
ev.event_name in ('page_ping', 'page_view')
and ev.page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots('ev') }}
{% endif %}
),

session_lasts as (
select
domain_sessionid,
page_title as last_page_title,
page_url as last_page_url,
page_urlscheme as last_page_urlscheme,
page_urlhost as last_page_urlhost,
page_urlpath as last_page_urlpath,
page_urlquery as last_page_urlquery,
page_urlfragment as last_page_urlfragment,
row_number() over (partition by ev.domain_sessionid order by ev.derived_tstamp desc, ev.dvce_created_tstamp) AS page_event_in_session_index
from {{ ref('snowplow_web_base_events_this_run') }} ev
where
event_name in ('page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
),

session_aggs as (
select
domain_sessionid,
min(derived_tstamp) as start_tstamp,
max(derived_tstamp) as end_tstamp,
-- engagement fields
count(distinct page_view_id) as page_views,
-- (hb * (#page pings - # distinct page view ids ON page pings)) + (# distinct page view ids ON page pings * min visit length)
({{ var("snowplow__heartbeat", 10) }} * (
-- number of (unqiue in heartbeat increment) pages pings following a page ping (gap of heartbeat)
count(distinct case
when event_name = 'page_ping' then
-- need to get a unique list of floored time PER page view, so create a dummy surrogate key...
{{ dbt.concat(['page_view_id', "cast(floor("~snowplow_utils.to_unixtstamp('dvce_created_tstamp')~"/"~var('snowplow__heartbeat', 10)~") as "~dbt.type_string()~")" ]) }}
else
null end) -
count(distinct case when event_name = 'page_ping' then page_view_id else null end)
)) +
-- number of page pings following a page view (or no event) (gap of min visit length)
(count(distinct case when event_name = 'page_ping' then page_view_id else null end) * {{ var("snowplow__min_visit_length", 5) }}) as engaged_time_in_s,
{{ snowplow_utils.timestamp_diff('min(derived_tstamp)', 'max(derived_tstamp)', 'second') }} as absolute_time_in_s
from {{ ref('snowplow_web_base_events_this_run') }}
where
event_name in ('page_ping', 'page_view')
and page_view_id is not null
{% if var("snowplow__ua_bot_filter", true) %}
{{ filter_bots() }}
{% endif %}
group by
domain_sessionid
)

select
-- app id
a.app_id,

-- session fields
a.domain_sessionid,
a.domain_sessionidx,

-- when the session starts with a ping we need to add the min visit length to get when the session actually started
case when a.event_name = 'page_ping' then
{{ snowplow_utils.timestamp_add(datepart="second", interval=-var("snowplow__min_visit_length", 5), tstamp="c.start_tstamp") }}
else c.start_tstamp end as start_tstamp,
c.end_tstamp,
a.model_tstamp,

-- user fields
a.user_id,
a.domain_userid,
a.stitched_user_id,
a.network_userid,

-- engagement fields
c.page_views,
c.engaged_time_in_s,
-- when the session starts with a ping we need to add the min visit length to get when the session actually started
c.absolute_time_in_s + case when a.event_name = 'page_ping' then {{ var("snowplow__min_visit_length", 5) }} else 0 end as absolute_time_in_s,

-- first page fields
a.first_page_title,
a.first_page_url,
a.first_page_urlscheme,
a.first_page_urlhost,
a.first_page_urlpath,
a.first_page_urlquery,
a.first_page_urlfragment,

-- only take the first value when the last is genuinely missing (base on url as has to always be populated)
case when b.last_page_url is null then coalesce(b.last_page_title, a.first_page_title) else b.last_page_title end as last_page_title,
case when b.last_page_url is null then coalesce(b.last_page_url, a.first_page_url) else b.last_page_url end as last_page_url,
case when b.last_page_url is null then coalesce(b.last_page_urlscheme, a.first_page_urlscheme) else b.last_page_urlscheme end as last_page_urlscheme,
case when b.last_page_url is null then coalesce(b.last_page_urlhost, a.first_page_urlhost) else b.last_page_urlhost end as last_page_urlhost,
case when b.last_page_url is null then coalesce(b.last_page_urlpath, a.first_page_urlpath) else b.last_page_urlpath end as last_page_urlpath,
case when b.last_page_url is null then coalesce(b.last_page_urlquery, a.first_page_urlquery) else b.last_page_urlquery end as last_page_urlquery,
case when b.last_page_url is null then coalesce(b.last_page_urlfragment, a.first_page_urlfragment) else b.last_page_urlfragment end as last_page_urlfragment,

-- referrer fields
a.referrer,
a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,
a.refr_medium,
a.refr_source,
a.refr_term,

-- marketing fields
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.mkt_clickid,
a.mkt_network,

-- geo fields
a.geo_country,
a.geo_region,
a.geo_region_name,
a.geo_city,
a.geo_zipcode,
a.geo_latitude,
a.geo_longitude,
a.geo_timezone,

-- ip address
a.user_ipaddress,

-- user agent
a.useragent,

a.br_renderengine,
a.br_lang,

a.os_timezone,

-- optional fields, only populated if enabled.
-- iab enrichment fields
a.category,
a.primary_impact,
a.reason,
a.spider_or_robot,

-- ua parser enrichment fields
a.useragent_family,
a.useragent_major,
a.useragent_minor,
a.useragent_patch,
a.useragent_version,
a.os_family,
a.os_major,
a.os_minor,
a.os_patch,
a.os_patch_minor,
a.os_version,
a.device_family,

-- yauaa enrichment fields
a.device_class,
a.agent_class,
a.agent_name,
a.agent_name_version,
a.agent_name_version_major,
a.agent_version,
a.agent_version_major,
a.device_brand,
a.device_name,
a.device_version,
a.layout_engine_class,
a.layout_engine_name,
a.layout_engine_name_version,
a.layout_engine_name_version_major,
a.layout_engine_version,
a.layout_engine_version_major,
a.operating_system_class,
a.operating_system_name,
a.operating_system_name_version,
a.operating_system_version
from
session_firsts a
left join
session_lasts b on a.domain_sessionid = b.domain_sessionid and b.page_event_in_session_index = 1
left join
session_aggs c on a.domain_sessionid = c.domain_sessionid
where
a.page_event_in_session_index = 1

Depends On

Referenced By

Snowplow Web User Mapping

models/user_mapping/snowplow_web_user_mapping.sql

Description

A mapping table between domain_userid and user_id.

Type: Table

Details

Columns
Column NameDescriptionType
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
end_tstampThe collector_tstamp when the user was last activetimestamp_ntz
Code
Source
{{
config(
materialized='incremental',
unique_key='domain_userid',
sort='end_tstamp',
dist='domain_userid',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "end_tstamp",
"data_type": "timestamp"
}),
tags=["derived"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}


select distinct
domain_userid,
last_value(user_id) over(
partition by domain_userid
order by collector_tstamp
rows between unbounded preceding and unbounded following
) as user_id,
max(collector_tstamp) over (partition by domain_userid) as end_tstamp

from {{ ref('snowplow_web_base_events_this_run') }}

where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.
and user_id is not null
and domain_userid is not null

Depends On

Referenced By

Snowplow Web Users

models/users/snowplow_web_users.sql

Description

This derived incremental table contains all historic users data and should be the end point for any analysis or BI tools.

Type: Table

Details

Columns
Column NameDescriptionType
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’text
start_tstampTimestamp for the start of the users lifecycle, based on derived_tstamptimestamp_ntz
end_tstampTimestamp for the last time the user was seen, based on derived_tstamptimestamp_ntz
model_tstampThe current timestamp when the model processed this row.timestamp_ntz
page_viewsThe total page views by the usernumber
sessionsThe total sessions by the usernumber
engaged_time_in_sThe total engaged time in seconds by the usernumber
first_page_titleThe title of the first page visited by the usertext
first_page_urlThe url of the first page visited by the usertext
first_page_urlschemeThe urlscheme of the first page visited by the usertext
first_page_urlhostThe urlhost of the first page visited by the usertext
first_page_urlpathThe urlpath of the first page visited by the usertext
first_page_urlqueryThe urlquery of the first page visited by the usertext
first_page_urlfragmentThe urlfragment of the first page visited by the usertext
last_page_titleThe title of the last page visited by the usertext
last_page_urlThe url of the last page visited by the usertext
last_page_urlschemeThe urlscheme of the last page visited by the usertext
last_page_urlhostThe urlhost of the last page visited by the usertext
last_page_urlpathThe urlpath of the last page visited by the usertext
last_page_urlqueryThe urlquery of the last page visited by the usertext
last_page_urlfragmentThe urlfragment of the last page visited by the usertext
referrerThe referrer associated with the first page view of the usertext
refr_urlschemeReferer scheme e.g. ‘http’text
refr_urlhostReferer host e.g. ‘www.bing.com’text
refr_urlpathReferer page path e.g. ‘/images/search’text
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’text
refr_urlfragmentReferer URL fragmenttext
refr_mediumType of referer e.g. ‘search’, ‘internal’text
refr_sourceName of referer if recognised e.g. ‘Bing images’text
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’text
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’text
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’text
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’text
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723text
mkt_campaignThe campaign ID e.g. ‘diageo-123’text
mkt_clickidThe click ID e.g. ‘ac3d8e459’text
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’text
Code
Source
{{
config(
materialized='incremental',
unique_key='domain_userid',
upsert_date_key='start_tstamp',
disable_upsert_lookback=true,
sort='start_tstamp',
dist='domain_userid',
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_web.web_cluster_by_fields_users(),
tags=["derived"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
'delta.autoOptimize.autoCompact' : 'true'
},
snowplow_optimize = true
)
}}

select *
{% if target.type in ['databricks', 'spark'] -%}
, DATE(start_tstamp) as start_tstamp_date
{%- endif %}
from {{ ref('snowplow_web_users_this_run') }}
where {{ snowplow_utils.is_run_with_new_events('snowplow_web') }} --returns false if run doesn't contain new events.

Depends On

Snowplow Web Users Aggs

models/users/scratch/snowplow_web_users_aggs.sql

Description

This model aggregates various metrics derived from sessions to a users level.

Type: Table

Details

Columns
Column NameDescriptionType
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
start_tstamptimestamp_ntz
end_tstamptimestamp_ntz
first_domain_sessionidtext
last_domain_sessionidtext
page_viewsnumber
sessionsnumber
engaged_time_in_snumber
Code
Source
{{
config(
partition_by = snowplow_utils.get_value_by_target_type(bigquery_val={
"field": "start_tstamp",
"data_type": "timestamp"
}),
cluster_by=snowplow_utils.get_value_by_target_type(bigquery_val=["domain_userid"]),
sort='domain_userid',
dist='domain_userid',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

select
domain_userid,
-- time
user_start_tstamp as start_tstamp,
user_end_tstamp as end_tstamp,
-- first/last session. Max to resolve edge case with multiple sessions with the same start/end tstamp
max(case when start_tstamp = user_start_tstamp then domain_sessionid end) as first_domain_sessionid,
max(case when end_tstamp = user_end_tstamp then domain_sessionid end) as last_domain_sessionid,
-- engagement
sum(page_views) as page_views,
count(distinct domain_sessionid) as sessions,
sum(engaged_time_in_s) as engaged_time_in_s

from {{ ref('snowplow_web_users_sessions_this_run') }}

group by 1,2,3

Depends On

Referenced By

Snowplow Web Users Lasts

models/users/scratch/snowplow_web_users_lasts.sql

Description

This model identifies the last page view for a user and returns various dimensions associated with that page view.

Type: Table

Details

Columns
Column NameDescriptionType
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
last_page_titletext
last_page_urltext
last_page_urlschemetext
last_page_urlhosttext
last_page_urlpathtext
last_page_urlquerytext
last_page_urlfragmenttext
Code
Source
{{
config(
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}


select
a.domain_userid,
a.last_page_title,

a.last_page_url,

a.last_page_urlscheme,
a.last_page_urlhost,
a.last_page_urlpath,
a.last_page_urlquery,
a.last_page_urlfragment

from {{ ref('snowplow_web_users_sessions_this_run') }} a

inner join {{ ref('snowplow_web_users_aggs') }} b
on a.domain_sessionid = b.last_domain_sessionid

Depends On

Referenced By

Snowplow Web Users Sessions This Run

models/users/scratch/snowplow_web_users_sessions_this_run.sql

Description

This model contains all sessions data related to users contained in the given run of the Web model

Type: Table

Details

Columns
Column NameDescriptionType
app_idtext
domain_sessionidA visit / session UUID e.g. ‘c6ef3124-b53a-4b13-a233-0088f79dcbcb’text
domain_sessionidxnumber
start_tstamptimestamp_ntz
end_tstamptimestamp_ntz
model_tstamptimestamp_ntz
user_idtext
domain_useridtext
stitched_user_idtext
network_useridtext
page_viewsnumber
engaged_time_in_snumber
absolute_time_in_snumber
first_page_titletext
first_page_urltext
first_page_urlschemetext
first_page_urlhosttext
first_page_urlpathtext
first_page_urlquerytext
first_page_urlfragmenttext
last_page_titletext
last_page_urltext
last_page_urlschemetext
last_page_urlhosttext
last_page_urlpathtext
last_page_urlquerytext
last_page_urlfragmenttext
referrertext
refr_urlschemetext
refr_urlhosttext
refr_urlpathtext
refr_urlquerytext
refr_urlfragmenttext
refr_mediumtext
refr_sourcetext
refr_termtext
mkt_mediumtext
mkt_sourcetext
mkt_termtext
mkt_contenttext
mkt_campaigntext
mkt_clickidtext
mkt_networktext
geo_countrytext
geo_regiontext
geo_region_nametext
geo_citytext
geo_zipcodetext
geo_latitudefloat
geo_longitudefloat
geo_timezonetext
user_ipaddresstext
useragenttext
br_renderenginetext
br_langtext
os_timezonetext
categorytext
primary_impacttext
reasontext
spider_or_robotboolean
useragent_familytext
useragent_majortext
useragent_minortext
useragent_patchtext
useragent_versiontext
os_familytext
os_majortext
os_minortext
os_patchtext
os_patch_minortext
os_versiontext
device_familytext
device_classtext
agent_classtext
agent_nametext
agent_name_versiontext
agent_name_version_majortext
agent_versiontext
agent_version_majortext
device_brandtext
device_nametext
device_versiontext
layout_engine_classtext
layout_engine_nametext
layout_engine_name_versiontext
layout_engine_name_version_majortext
layout_engine_versiontext
layout_engine_version_majortext
operating_system_classtext
operating_system_nametext
operating_system_name_versiontext
operating_system_versiontext
user_start_tstamptimestamp_ntz
user_end_tstamptimestamp_ntz
Code
Source
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

with user_ids_this_run as (
select
distinct domain_userid

from {{ ref('snowplow_web_base_sessions_this_run') }}
where domain_userid is not null
)

select
a.*,
min(a.start_tstamp) over(partition by a.domain_userid) as user_start_tstamp,
max(a.end_tstamp) over(partition by a.domain_userid) as user_end_tstamp

from {{ var('snowplow__sessions_table') }} a
inner join user_ids_this_run b
on a.domain_userid = b.domain_userid

Depends On

Referenced By

Snowplow Web Users This Run

models/users/scratch/snowplow_web_users_this_run.sql

Description

This staging table contains all the users for the given run of the Web model. It possess all the same columns as snowplow_web_users. If building a custom module that requires session level data, this is the table you should reference.

Type: Table

Details

Columns
Column NameDescriptionType
user_idUnique ID set by business e.g. ‘jon.doe@email.comtext
domain_useridUser ID set by Snowplow using 1st party cookie e.g. ‘bc2e92ec6c204a14’text
network_useridUser ID set by Snowplow using 3rd party cookie e.g. ‘ecdff4d0-9175-40ac-a8bb-325c49733607’text
start_tstampTimestamp for the start of the users lifecycle, based on derived_tstamptimestamp_ntz
end_tstampTimestamp for the last time the user was seen, based on derived_tstamptimestamp_ntz
model_tstampThe current timestamp when the model processed this row.timestamp_ntz
page_viewsThe total page views by the usernumber
sessionsThe total sessions by the usernumber
engaged_time_in_sThe total engaged time in seconds by the usernumber
first_page_titleThe title of the first page visited by the usertext
first_page_urlThe url of the first page visited by the usertext
first_page_urlschemeThe urlscheme of the first page visited by the usertext
first_page_urlhostThe urlhost of the first page visited by the usertext
first_page_urlpathThe urlpath of the first page visited by the usertext
first_page_urlqueryThe urlquery of the first page visited by the usertext
first_page_urlfragmentThe urlfragment of the first page visited by the usertext
last_page_titleThe title of the last page visited by the usertext
last_page_urlThe url of the last page visited by the usertext
last_page_urlschemeThe urlscheme of the last page visited by the usertext
last_page_urlhostThe urlhost of the last page visited by the usertext
last_page_urlpathThe urlpath of the last page visited by the usertext
last_page_urlqueryThe urlquery of the last page visited by the usertext
last_page_urlfragmentThe urlfragment of the last page visited by the usertext
referrerThe referrer associated with the first page view of the usertext
refr_urlschemeReferer scheme e.g. ‘http’text
refr_urlhostReferer host e.g. ‘www.bing.com’text
refr_urlpathReferer page path e.g. ‘/images/search’text
refr_urlqueryReferer URL querystring e.g. ‘q=psychic+oracle+cards’text
refr_urlfragmentReferer URL fragmenttext
refr_mediumType of referer e.g. ‘search’, ‘internal’text
refr_sourceName of referer if recognised e.g. ‘Bing images’text
refr_termKeywords if source is a search engine e.g. ‘psychic oracle cards’text
mkt_mediumType of traffic source e.g. ‘cpc’, ‘affiliate’, ‘organic’, ‘social’text
mkt_sourceThe company / website where the traffic came from e.g. ‘Google’, ‘Facebook’text
mkt_termAny keywords associated with the referrer e.g. ‘new age tarot decks’text
mkt_contentThe content of the ad. (Or an ID so that it can be looked up.) e.g. 13894723text
mkt_campaignThe campaign ID e.g. ‘diageo-123’text
mkt_clickidThe click ID e.g. ‘ac3d8e459’text
mkt_networkThe ad network to which the click ID belongs e.g. ‘DoubleClick’text
Code
Source
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

select
-- user fields
a.user_id,
a.domain_userid,
a.network_userid,

b.start_tstamp,
b.end_tstamp,
{{ snowplow_utils.current_timestamp_in_utc() }} as model_tstamp,

-- engagement fields
b.page_views,
b.sessions,

b.engaged_time_in_s,

-- first page fields
a.first_page_title,

a.first_page_url,

a.first_page_urlscheme,
a.first_page_urlhost,
a.first_page_urlpath,
a.first_page_urlquery,
a.first_page_urlfragment,

c.last_page_title,

c.last_page_url,

c.last_page_urlscheme,
c.last_page_urlhost,
c.last_page_urlpath,
c.last_page_urlquery,
c.last_page_urlfragment,

-- referrer fields
a.referrer,

a.refr_urlscheme,
a.refr_urlhost,
a.refr_urlpath,
a.refr_urlquery,
a.refr_urlfragment,

a.refr_medium,
a.refr_source,
a.refr_term,

-- marketing fields
a.mkt_medium,
a.mkt_source,
a.mkt_term,
a.mkt_content,
a.mkt_campaign,
a.mkt_clickid,
a.mkt_network

from {{ ref('snowplow_web_users_aggs') }} as b

inner join {{ ref('snowplow_web_users_sessions_this_run') }} as a
on a.domain_sessionid = b.first_domain_sessionid

inner join {{ ref('snowplow_web_users_lasts') }} c
on b.domain_userid = c.domain_userid

Depends On

Referenced By

Was this page helpful?