Ah… Marketing attribution — the key to optimizing marketing strategies and improving marketing spend ROI. 😌
Sounds perfect, right?
There’s just one, teensy problem: Current marketing attribution software solutions have marketers feeling unsatisfied at best. In fact, according to a survey by QueryClick, 58% of marketers say their current attribution models are an obstacle to implementing marketing activities. And only 11% of marketers are happy with their attribution solutions. 😬 While these existing software solutions do provide some valuable insights, they fall short of meeting today’s dynamic marketing needs.
In this blog post, we'll explore how to overcome the limitations of current marketing attribution software solutions by modeling marketing data in plain SQL.
The limitations of current marketing attribution software solutions
Like we said: Most marketers are dissatisfied with the current marketing attribution software solutions.
One of their main limitations? They’re often too rigid and inflexible. Existing solutions rely on predefined rules and models to assign credit to each touchpoint, so they (typically) lead to inaccurate and incomplete results. They simply don’t provide enough transparency into how the credit is assigned, making it difficult for marketers to understand and interpret the results.
Another roadblock is that current marketing attribution solutions don’t integrate very well with other marketing tools and platforms. 🚧 Among other things, that disconnect makes it difficult for marketers to bring together data from different sources and gain a comprehensive view of their marketing efforts.
So… if marketing attribution — when done correctly — improves marketing campaigns and ROI, but current solutions fail to meet expectations, where do you turn? 👂 Pssst… You can model your marketing data in plain SQL.
Why use SQL for marketing attribution?
SQL is a powerful tool for uniting data from multiple sources and applying sophisticated algorithms to assign credit to each touchpoint.
And using it for marketing attribution has several advantages over traditional marketing attribution software solutions.👇
- SQL provides more flexibility and control over the attribution process. You can define your own attribution rules and models, and easily modify or adjust them as needed.
- SQL provides greater transparency into the attribution process. Now, you can see how credit is assigned to each touchpoint, so you can understand the underlying logic behind the attribution rules and models.
- SQL can help us integrate data from multiple sources and gain a comprehensive view of our marketing efforts. By joining your data from web analytics, CRMs, and marketing automation, you can analyze the effectiveness of different marketing channels and campaigns.
All about attribution modeling
Before we dive into the details of how to model marketing data in SQL, let's review the different types of attribution modeling:
- First touch: Assigns all the credit to the first touchpoint that led to a conversion.
- Last touch: Assigns all the credit to the last touchpoint that led to a conversion.
- Linear: Assigns equal credit to all touchpoints that led to a conversion.
- U shape: Assigns more credit to the first and last touchpoints, and less credit to the touchpoints in between.
- Time decay: Assigns more credit to the touchpoints that occurred closer in time to the conversion.
Of course, each attribution model has its own strengths and weaknesses, so the "best” model will depend on the specific needs and goals of your organization.
Building an attribution model in SQL
Let’s cut to the chase: To build an attribution model in SQL, you need to bring together data from multiple sources, including web analytics, CRM, and marketing automation. You also need to define your own attribution rules and models in order to apply sophisticated algorithms to assign credit to each touchpoint.
Here’s a basic table representation of the data schema involved:
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-cly1{text-align:left;vertical-align:middle}
.tg .tg-1wig{font-weight:bold;text-align:left;vertical-align:top}
Table
Fields
sessions
session_id, user_id, start_time, end_time, source_channel
touches
touch_id, user_id, timestamp, channel, campaign
conversions
conversion_id, user_id, conversion_time, revenue
In this example, we’ll use the schema in this table 👆 to build a view that assigns credit to each touchpoint based on our set of predefined rules. Then, once we analyze the effectiveness of different marketing channels and campaigns, we can optimize our marketing strategy and improve the ROI of our marketing spend.
Let’s write some SQL
Building a query takes four basic steps:
- Gather your required data sources
- Find all sessions before conversion
- Calculate the total sessions and the session index
- Allocate points
Here’s what each step looks like using actual SQL queries.
1. Gather your required data sources
The following snippet creates three CTEs for our three data sources — session, touch (can be any events), and conversion data — from their respective tables. We’re just prepping the data at this point, but this is what it looks like.
WITH sessions AS (
SELECT
session_id,
user_id,
start_time,
end_time,
source_channel
FROM
sessions_table
),
touches AS (
SELECT
touch_id,
user_id,
timestamp,
channel,
campaign
FROM
touches_table
),
conversions AS (
SELECT
conversion_id,
user_id,
conversion_time,
revenue
FROM
conversions_table
)
2. Find all sessions before conversion
Now, we have our source data, so let’s get crunching. Start by creating a view that combines session and conversion data. The goal is to identify all sessions that occurred before a conversion, then join the session and conversion data together. Here’s how the query looks:
WITH sessions_before_conversion AS (
SELECT
s.session_id,
s.user_id,
s.start_time,
s.end_time,
s.source_channel,
c.conversion_time
FROM
sessions s
JOIN conversions c ON s.user_id = c.user_id AND s.start_time < c.conversion_time
)
3. Calculate the total sessions and the session index
With the session data in hand, we can assign the weight to each of these sessions based on the number of touches using the following query:
WITH touches_with_session AS (
SELECT
t.touch_id,
t.user_id,
t.timestamp,
t.channel,
t.campaign,
s.session_id
FROM
touches t
JOIN sessions_before_conversion s ON t.user_id = s.user_id AND t.timestamp BETWEEN s.start_time AND s.end_time
),
touchpoints AS (
SELECT
user_id,
session_id,
channel,
campaign,
COUNT(DISTINCT touch_id) AS touches
FROM
touches_with_session
GROUP BY
user_id,
session_id,
channel,
campaign
),
touchpoint_weights AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY user_id, session_id
ORDER BY touches DESC
) AS touchpoint_rank
FROM
touchpoints
)
4. Allocate points
Allocate points to each touchpoint based on the model you want. Here, we are assigning 100% of the credit to the first touchpoint and only 50% of the credit to the second touchpoint with the following query.
WITH attribution AS (
SELECT
user_id,
session_id,
channel,
campaign,
touches,
CASE touchpoint_rank
WHEN 1 THEN 1
WHEN 2 THEN 0.5
ELSE 0
END AS attribution_weight
FROM
touchpoint_weights
WHERE
touchpoint_rank <= 2
)
5. Join in revenue and ad spend data
Use the allocated credit to analyze the effectiveness of different marketing channels and campaigns based on the revenue they generated plus the cost of your ads. The final output calculates the ROI for each marketing channel and campaign based on the attributed credit and revenue data.
WITH attribution AS (
...
),
attribution_by_channel_campaign AS (
SELECT
channel,
campaign,
SUM(attribution_weight) AS attribution
FROM
attribution
GROUP BY
channel,
campaign
),
revenue_by_channel_campaign AS (
SELECT
channel,
campaign,
SUM(revenue) AS revenue
FROM
conversions
GROUP BY
channel,
campaign
)
SELECT
abc.channel,
abc.campaign,
abc.attribution,
rbc.revenue,
abc.attribution / rbc.revenue AS roi
FROM
attribution_by_channel_campaign abc
JOIN revenue_by_channel_campaign rbc ON abc.channel = rbc.channel AND abc.campaign = rbc.campaign
And when you combine the above five steps, you get an attribution model in SQL that assigns credit to each touchpoint based on a set of predefined rules. 🙌 Putting it all together, you get the full query.
WITH sessions_before_conversion AS (
SELECT
s.session_id,
s.user_id,
s.start_time,
s.end_time,
s.source_channel,
c.conversion_time
FROM
sessions s
JOIN conversions c ON s.user_id = c.user_id AND s.start_time < c.conversion_time
),
touches_with_session AS (
SELECT
t.touch_id,
t.user_id,
t.timestamp,
t.channel,
t.campaign,
s.session_id
FROM
touches t
JOIN sessions_before_conversion s ON t.user_id = s.user_id AND t.timestamp BETWEEN s.start_time AND s.end_time
),
touchpoints AS (
SELECT
user_id,
session_id,
channel,
campaign,
COUNT(DISTINCT touch_id) AS touches
FROM
touches_with_session
GROUP BY
user_id,
session_id,
channel,
campaign
),
touchpoint_weights AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY user_id, session_id
ORDER BY touches DESC
) AS touchpoint_rank
FROM
touchpoints
),
attribution AS (
SELECT
user_id,
session_id,
channel,
campaign,
touches,
CASE touchpoint_rank
WHEN 1 THEN 1
WHEN 2 THEN 0.5
ELSE 0
END AS attribution_weight
FROM
touchpoint_weights
WHERE
touchpoint_rank <= 2
),
attribution_by_channel_campaign AS (
SELECT
channel,
campaign,
SUM(attribution_weight) AS attribution
FROM
attribution
GROUP BY
channel,
campaign
),
revenue_by_channel_campaign AS (
SELECT
channel,
campaign,
SUM(revenue) AS revenue
FROM
conversions
GROUP BY
channel,
campaign
)
SELECT
abc.channel,
abc.campaign,
abc.attribution,
rbc.revenue,
abc.attribution / rbc.revenue AS roi
FROM
attribution_by_channel_campaign abc
JOIN revenue_by_channel_campaign rbc ON abc.channel = rbc.channel AND abc.campaign = rbc.campaign
Limitations of SQL-based attribution modeling
While modeling marketing data in SQL can help us overcome the limitations of current marketing attribution software solutions, there are still some limitations to be aware of.
- Positional attribution is not a perfect representation of human decision-making. In reality, customers may be influenced by multiple touchpoints, so their decision-making process may be more complex than can be captured by a simple attribution model.
- The quality of the source data is often low. Web traffic is often poorly tagged with UTM parameters, and CRM data may be incomplete or inaccurate. Poor source data quality can make assigning credit to each touchpoint difficult.
Despite these few limitations, modeling marketing data in SQL helps you gain deeper insights into the effectiveness of our marketing campaigns. And by experimenting with different attribution models and analyzing the results, you can continuously improve marketing attribution and drive better business outcomes. 🚀
The future of marketing attribution is SQL
Sure, marketing attribution is a game-changer for marketers. But if (and when) current software solutions fall short of meeting your evolving needs, start modeling your marketing data in plain SQL — and gain deeper insights into the effectiveness of your marketing campaigns along the way.
Remember: The SQL code example we ran through is just one way to model marketing data. There are so many other approaches you can use to gain deeper insights into the effectiveness of your marketing campaigns and improve ROI. It all just comes down to the specific needs of your organization.
💪 Want to learn how Census can empower your marketing attribution mission? Book a demo with a Census product specialist.