Data quality is the foundation of every successful business initiative. Yet, despite significant investments in data infrastructure, many organizations continue to struggle with one of the most fundamental data quality issues: duplicate records.
Until now, Entity Resolution has existed outside the modern data workflow. Despite investments in robust data infrastructure, when it comes to identifying and resolving duplicate records, companies have been forced to adopt disconnected point solutions or build complex custom code. The result? Siloed processes, higher costs, and a fragmented approach to data quality.
Today, we're excited to announce the launch of Census Entity Resolution — a powerful new capability built directly into the Census platform that helps you identify, merge, and resolve duplicate records across your business data.
The Problem with Duplicate Data
Duplicate records create chaos across your entire data ecosystem:
- Marketing teams waste budget targeting the same customer multiple times
- Sales teams create redundant outreach efforts and conflicting account ownership
- Customer support lacks a unified view of customer interactions
- Analytics teams produce inaccurate metrics on customer counts and behavior
According to a 2021 report by Gartner, organizations believe poor data quality costs them an average of $12.9 million annually. While data quality encompasses many factors, duplicate records remain one of the most persistent and difficult challenges to solve.
How Census Entity Resolution Works
Census Entity Resolution uses deterministic rules with fuzzy matching capabilities to identify and resolve duplicate records within your datasets. Unlike complex, standalone solutions that require specialized expertise, Census Entity Resolution is built directly into your data activation workflows.
Here's how it works:
- Define Match Rules - Specify how potential duplicates should be identified using a combination of exact and fuzzy matching criteria. Our fuzzy matching uses Jaro-Winkler Distance with configurable confidence thresholds (low, medium, high) to catch variations like:
- "Acme Corporation" vs. "Acme Corp"
- "john.smith@example.com" vs. "johnsmith@example.com"
- "123 Main Street" vs. "123 Main St."
- Create Merge Rules - Determine which record should be the "winner" when duplicates are found, using a waterfall prioritization system.
- Apply Column Overrides - Selectively choose which data points from which records should be preserved in the final, de-duplicated record.
- Choose Your Output Mode - Select between two powerful options:
- Merged Mode - Outputs only clean, de-duplicated records with overrides applied
- Mark-as-Dupe Mode - Outputs all records with metadata showing duplicate relationships
In the above example, three customer records for "Jane Smith" from different systems were automatically identified, with the earliest created record designated as the winner, while preserving the most important information (high score) from across all records.
Census Entity Resolution vs AWS Entity Resolution
When building our Entity Resolution capabilities, we rigorously evaluated existing solutions using industry-standard benchmark datasets specifically designed to test fuzzy matching accuracy. These datasets contain a range of variations - including misspellings, formatting differences, abbreviations, and structural changes - that help assess matching performance under controlled conditions.
Our benchmarks showed that Census Entity Resolution significantly outperforms AWS's solution across key metrics:
Metric | Census ER | AWS ER | Census Advantage |
% of Duplicates Detected |
97.5% | 47.2% | 2x more likely to catch tricky duplicates |
Processing Speed | 1.87 minutes | 20.1 minutes | More than 10x faster |
Cost per Million Records | $50 | $250 | 80% cheaper |
Beyond the raw numbers, Census Entity Resolution provides a dramatically simpler setup experience, more intuitive controls, and seamless integration with your existing data workflows.
Seamless Integration with Your Data Workflows
Census Entity Resolution isn't a standalone tool — it's fully integrated into your existing Census workflows. This powerful interoperability enables sophisticated data pipelines that were previously impossible or required complex custom development.
You can apply Entity Resolution to any dataset in your ecosystem — warehouse tables, SaaS application data, streaming sources (like Kafka), or even CSV uploads. Once your data is deduplicated, you can send it to any destination and apply it to any entity type: customers, accounts, leads, products, or custom objects.
What makes Census Entity Resolution truly powerful is how it works seamlessly with all other Census capabilities:
- Apply AI columns only to deduplicated data, ensuring your AI-driven insights aren't skewed by duplicates
- Enrich your data from third-party sources, use those enriched fields for more accurate deduplication, then sync the clean data
- Use resolved entities as inputs to segments, models, or other transformations
- Orchestrate multistep workflows that clean, transform, and activate your data in any order
Unlike traditional entity resolution solutions that force you to move your data outside your existing stack, Census brings powerful Entity Resolution capabilities directly to where your data already lives.
Our Vision: True Golden Records Without the Heavy Lifting
Today's announcement is just the beginning. While Census Entity Resolution already delivers powerful de-duplication within datasets, our vision extends beyond single-source matching.
Soon, you'll be able to resolve entities across multiple data sources simultaneously — creating true golden records without first consolidating all your data into a single location. Imagine automatically reconciling customer profiles across your data warehouse, CRM, marketing platform, and support system without complex ETL processes or data migrations.
This cross-source resolution capability will eliminate the traditional prerequisite of centralizing all your data before you can begin building unified customer profiles. Instead, Census will handle the complexity of identifying and linking related records wherever they reside, transforming how organizations approach data unification.
What's Next?
In the coming weeks, we'll be sharing more about Entity Resolution, including:
- Technical deep-dives into our matching algorithms
- Best practices for setting up effective match and merge rules
- Customer case studies showing the impact of clean, de-duplicated data
We believe that data activation is only as good as the data being activated. With Census Entity Resolution, we're taking another step toward ensuring that the data flowing through your business applications is clean, accurate, and trustworthy.
Have questions about Entity Resolution? Check out our documentation to learn more or schedule a demo.