Every HubSpot portal has duplicates. Even after running HubSpot's built-in dedup tool.
Why? Because the native tool catches exact email matches and that's about it.
The person who signed up with their Gmail and later used their work email for a demo? Two records. The company your sales rep entered as "Acme Inc" while marketing automation created "Acme, Incorporated"? Two records with split engagement history.
Here's how to actually find — and safely merge — the duplicates that cost you money.
Why Duplicates Cost More Than You Think
Duplicates aren't just messy. They have direct financial impact:
- You're paying twice: HubSpot's contact-based pricing means every duplicate inflates your tier. At scale, that's thousands per year in unnecessary fees.
- Split engagement: Half the activity on Record A, half on Record B. Neither shows the full picture. Wrong lead scores, missed signals.
- Double outreach: Sales emails the same person twice. Best case: embarrassing. Worst case: spam complaint.
- Broken attribution: Marketing can't prove ROI when engagement is scattered across duplicate records.
The 4-Pass Deduplication Method
We use a layered approach that catches duplicates at different confidence levels. Each pass finds what the previous one missed.
| Pass | Method | Confidence | Catch Rate |
|---|---|---|---|
| 1 | Exact email match | High | ~40% of dupes |
| 2 | Name + company match | Medium-High | ~25% of remaining |
| 3 | Phone number match | Medium | ~15% of remaining |
| 4 | Domain + fuzzy name | Lower | ~10% of remaining |
Pass 1: Exact Email Match
This is what HubSpot's native tool does. Group contacts by email, flag groups with more than one record. Safe to merge with minimal review.
Pass 2: Name + Company Match
Normalize first + last name + company (lowercase, remove punctuation, trim whitespace). "John Smith at Acme Inc" and "john smith at Acme, Inc." should match.
Pass 3: Phone Number Match
Strip all non-numeric characters, handle country codes, and match. This catches the Gmail-vs-work-email scenario when both records share a phone number.
Pass 4: Domain + Fuzzy Name Match
Contacts at the same email domain with similar (but not identical) names. "Bob Johnson" and "Robert Johnson" at @acme.com? Likely the same person.
Use Levenshtein distance or similar fuzzy matching on first names. Require exact last name and domain match. Set similarity threshold at 80%+ and always flag for manual review.
Merging: What to Keep, What to Discard
Identifying duplicates is half the battle. Merging them wrong means losing data.
Merge Rules
- Primary record: Keep the one with the most engagement (more activities, form submissions, interactions)
- Emails: Keep all — HubSpot stores primary + additional emails
- Conflicting properties: Keep the most recently updated value
- Associations: Merge preserves all company, deal, and ticket links from both records
- Timeline: All activities from both records appear on the merged timeline
Safety Checklist Before Merging
- Export a backup of both records before merging
- Check workflow enrollments — merging can trigger re-enrollment
- Verify deal associations — don't accidentally merge deals from two real different people
- For company merges — confirm it's the same entity, not subsidiary vs parent
Preventing Future Duplicates
Finding and merging is only half the battle. You need to stop new dupes from entering.
- Form settings: Set forms to "update existing contacts" by email instead of "always create." Huge reduction in form-sourced duplicates.
- Import rules: Always use "Update existing records" when importing CSVs. Never "Create new only" unless you're 100% sure.
- Integration validation: Ensure integrations (Salesforce, Intercom, etc.) deduplicate before creating records in HubSpot.
- Workflow guard: Build a workflow that checks for existing contacts by email before creating new records via API.
Automate the Detection
Running manual dedup quarterly is better than nothing. But duplicates accumulate between runs.
The HubSpot Audit Tool runs all 4 passes automatically, shows duplicates with confidence scores, and lets you merge high-confidence matches with one click while flagging lower-confidence ones for manual review.