Skip to main content

Content analytics

Introduction

If the primary function of your site is content consumption, whether it's reading news articles or watching videos, you'll want to understand how that content is performing. While traditional web analytics is focused on page views and sessions, you might be more interested in how long users are engaging with what content.

This recipe will give you an overview of how Snowplow empowers you to get better insights into how your content is performing.

What you'll be doing

You have already set up Snowplow’s out of the box web tracking by instrumenting the Javascript Tracker in your application. This includes tracking page_view and page_ping events.

To understand how people are engaging with your content, you’ll want to be tie these events to specific pieces of content, not just pages.

For this purpose, you can add a content entity which will be sent every time these events are tracked. Learn more about Snowplow events and entities here. You can then aggregate all of your user behavioral data into one row per content piece to get a better view of how your content is performing.

Design and implement the content entity

Designing the entity

We have already created a custom content entity for you in Iglu Central.

Snowplow uses self-describing JSON schemas to structure events and entities so that they can be validated in the pipeline and loaded into tidy tables in the warehouse. You can learn more about these data structures here, and about why we take this approach here.

While Try Snowplow only ships with a pre-designed set of custom events and entities required for the recipes, Snowplow BDP lets you create an unlimited number of your own via the Data Structures UI (and API) for Enterprise and via the Data Structures Builder for Cloud.

The content entity has the following fields:

FieldDescriptionTypeValidationRequired?
nameThe name of the piece of contentstringmaxLength: 255✅ 
idThe content identifierstringmaxLength: 255
categoryThe category of the piece of contentstringmaxLength: 255
date_publishedThe date the piece of content was publishedstringmaxLength: 255
authorThe author of the piece of contentstringmaxLength: 255

Implementing the entity

In the Javascript Tracker

Add the content entity to your page_view and page_ping events by editing your trackPageView events to include the entity. Specifically, update

window.snowplow('trackPageView');

to

window.snowplow('trackPageView', {
"context": [{
"schema": "iglu:io.snowplow.foundation/content/jsonschema/1-0-0",
"data": {
"name": "example_name",
"id": "example_id",
"category": "example_category",
"date_published": "01-01-1970",
"author": "example_author"
}
}]
});

Via Google Tag Manager

If you are using Google Tag Manager, you can add the variables like so:

window.snowplow('trackPageView', {
"context": [{
"schema": "iglu:io.snowplow.foundation/content/jsonschema/1-0-0",
"data": {
"name": "{{example_name_variable}}",
"id": "{{example_id_variable}}",
"category": "{{example_category_variable}}",
"date_published": "{{example_date_variable}}",
"author": "{{example_author_variable}}"
}
}]
});

Modeling the data you've collected

What does the model do?

The tracking above captures which content users are consuming and how they are engaging with it. This allows you to get a better understanding of how your content is performing.

For this recipe we'll create a simple table describing content engagement. Once you have collected some data with your new tracking you can run the following two queries in your tool of choice.

First generate the table:

CREATE TABLE derived.content AS(

WITH content_page_views AS(

SELECT
wp.id AS page_view_id,
c.category AS content_category,
c.name AS content_name,
c.date_published AS date_published,
c.author AS author,
10*SUM(CASE WHEN ev.event_name = 'page_ping' THEN 1 ELSE 0 END) AS time_engaged_in_s,
ROUND(100*(LEAST(LEAST(GREATEST(MAX(COALESCE(ev.pp_yoffset_max, 0)), 0), MAX(ev.doc_height)) + ev.br_viewheight, ev.doc_height)/ev.doc_height::FLOAT)) AS percentage_vertical_scroll_depth

FROM atomic.events AS ev
INNER JOIN atomic.com_snowplowanalytics_snowplow_web_page_1 AS wp
ON ev.event_id = wp.root_id AND ev.collector_tstamp = wp.root_tstamp
INNER JOIN atomic.io_snowplow_foundation_content_1 AS c
ON ev.event_id = c.root_id AND ev.collector_tstamp = c.root_tstamp

GROUP BY 1,2,3,4,5,ev.br_viewheight,ev.doc_height

)

SELECT
content_category,
content_name,
date_published,
author,
COUNT(DISTINCT page_view_id) AS page_views,
ROUND(SUM(time_engaged_in_s)/COUNT(DISTINCT page_view_id)) AS average_time_engaged_in_s,
ROUND(SUM(percentage_vertical_scroll_depth)/COUNT(DISTINCT page_view_id))AS average_percentage_vertical_scroll_depth

FROM content_page_views

GROUP BY 1,2,3,4

);

And then view it:

SELECT * FROM derived.content;

Let's break down what you've done

  • You have captured granular data around how your users are engaging with your content, including time engaged and scroll depth.
  • You have modeled this data into a content engagement table that surfaces the user engagement per content piece. This gives you an overview of how your content is performing across your site.

What you might want to do next

Understanding how your users are engaging with your content is just the first step. Next, you might want to

  • Extend this table to include where the content is being promoted on your site to understand how placement affects performance.
  • Start mapping the relationships between content pieces based on user behavior, working towards compelling content recommendations.
  • Pivot this data to look at users instead: understand which marketing channels users come from, and how that affects their engagement with your content.
  • Etc.

To learn more about Snowplow for media and entertainment, check out our blog series on the topic.

Ready to get started with content recommendations? Check out our step-by-step guide.

Was this page helpful?