Using SQL to create segments based on geo locations

Sean Lynch
30 August 2024

Segmentation in Census Audience Hub is built from the ground up to run directly on your data warehouse. There are a lot of benefits to this (governance, performance, live/complete data sets), but one of the less appreciated ones is that you can also simply use SQL directly as part of your segmentation. Think of it like an Excel formula. You don’t need to use it, but it can unlock a ton of the power of your warehouse if you need it.

We can use SQL Conditions to create segments that take into account geographic location.

Location targeting allows marketers to show ads or send offers based on a customer's location (either realtime if available or assumed based on an address). This approach is powerful because it targets people with an offer that is specifically relevant to them, and much more likely to be used. It turns a person's location into an opportunity for timely, personalized marketing that can drive immediate action.

Most modern data warehouses provide lots of tools to do interesting geospatial analysis. For most marketing uses, the basics are more than enough. Here’s a couple common examples. (We’ll use BigQuery for the example here, see the table below for more details on this support in other warehouses).

 

Use Case #1
"We have a location for every contact. I need to create a segment of everyone within a 10 mile radius of a specific location”

Steps:

  1. Get your warehouse’s representation of a geographic point. In BigQuery, this is ST_GEOGPOINT(longitude, latitude) . You’ll need the latitude and longitude for all of your contacts. If you only have an address, you’ll need to first use a Geocoding service. Google offers one of those as well, though it’s not available as a SQL function at this time.
  2. You’ll also need a longitude/latitude for the center of your search radius. This is usually a specific brick & motor location, or the middle of a city.
  3. Now you can use a second SQL function to see if each contact’s location is within the area. For this, we’ll useST_DWITHIN(point_a, point_b, distance_in_meters) . This takes two points and a distance and returns a simple True/False whether Point A is within distance of Point B. So for us that looks like
ST_DWITHIN(contract_address_point,store_location_point, 10 * 1.60934)

You can take that SQL and directly add it as a condition along side other visual conditions. Now your segment is geo-aware with no additional work.

 

🌐 ‎Nerdy Detail

Technically speaking, all of these distance calculations are an approximation. Turns out the Earth isn’t a perfect sphere, it’s spheroid, and that makes calculating true geographic distance a bit more expensive. Different services take short cuts to this calculation to make it easy to run on large datasets, some more extreme than others (eg Redshift using Euclidian distance is disappointing).

 

Use Case #2
“I need to create a segment of all members within a region, like a zip code or a state”

For this, you still need a geographic point for all of your contacts, but rather than one point and radius, you’re going to need a way to define your region.

In this case, you’ll need a polygon. Turns out Zip codes are actually not really a region (though that’s usually ignored). For this example, lets us a really easy state: Colorado!

SELECT
  ST_CONTAINS(
	  ST_GEOGFROMTEXT('POLYGON((
      -109.05 41.0,
      -102.05 41.0,
      -102.05 37.0,
      -109.05 37.0,
      -109.05 41.0
    )), 
	  contract_address_point
	) AS is_within_colorado

In this case, we use the ST_GEOGFROMTEXT function to define the polygon for the state. The text within it is called Well Known Text which is a common format for describing geographic things. But depending on your region, you might end up with other formats such as JSON. Different warehouses can read some or all representations, and they’re easy to translate as well.

Just like before, I can drop this SQL into a Segment as well. Now I have all my Colorado users! (yes, checking a State column would probably be faster, but not nearly as flexible!)

 

Warehouse Support

  BigQuery Redshift Snowflake Databricks duckdb Postgres MySQL
Geo Point ST_GEOPOINT ST_POINT ST_MAKEGEOMPOINT ST_POINT Requires extensions ST_GeomFromText('POINT(lon1 lat1)')
Is Within ST_DWITHIN Perfect Sphere ST_DISTANCESPHERE you provide the earth radius ST_DWITHIN Perfect Sphere ST_DWITHIN ?   ST_DISTANCE will calculate distance
Geo Polygon ST_GEOGFROM + more ST_GEOGFROMTEXT + more TO_GEOGRAPHY + more ST_GEOMFROMTEXT + More   ST_GEOMFROMTEXT + more
Region contains ST_CONTAINS ST_CONTAINS ST_CONTAINS ST_CONTAINS   ST_CONTAINS

 

Nearly all the major data warehouses are going to give you access to the above four functions, though you may need to get a little creative (Redshift doesn’t provide ST_DWITHIN but you can implement it yourself with ST_DISTANCESPHERE and the radius of the Earth! Databricks surprisingly doesn’t provide this natively in SQL. Other databases may need extensions (such as Postgres). 

Your warehouse not supported? Fallback to the Haversine Formula which lets you calculate the perfect sphere (aka great circle) distance yourself.

The formula uses the latitude and longitude of two points to compute the central angle between them, and then multiplies this by the Earth's radius to get the distance. There’s a lot of math here so you may want to reach for your handy LLM. It’ll return something like this SQL query. It implements the Haversine formula to calculate the great-circle distance between two points on a sphere (in this case, Seattle and New York).

-- 6371 is the radius of the earth at the equator. You could switch to mean radius
-- if you're fancy.
SELECT 2 * 6371 * ASIN(SQRT(
    POWER(SIN((lat2 - lat1) * PI() / 180 / 2), 2) +
    COS(lat1 * PI() / 180) * COS(lat2 * PI() / 180) *
    POWER(SIN((lon2 - lon1) * PI() / 180 / 2), 2)
)) AS distance_km
FROM (
    SELECT 
        47.61 AS lat1, -122.33 AS lon1,  -- Seattle
        40.71 AS lat2, -74.01 AS lon2    -- New York
) AS coords;

To get started with Census Audience Hub, start a free 14-day trial or schedule a call.