Skip to content

How Vaulted Fixed HubSpot Duplicates for a SaaS with No Websites

 Featured Image

When Matthew Deal and his team at Vaulted took on Coworks as a client, they ran into a deduplication problem that most RevOps playbooks don't cover.

Coworks makes software for coworking spaces. Their target audience are often in the earliest stages of building a co-working business. No website yet. No job titles. Sometimes no company domain at all. Just a physical space and a Google Maps listing.

HubSpot's default deduplication logic relies heavily on company domain. No domain means no reliable primary key. Without that anchor, standard deduplication rules don't hold up, and any data you import risks becoming noise the moment you import more.

The question wasn't just "how do we clean the data." It was: how do you deduplicate records that don't have the field you'd normally deduplicate on?

πŸŽ₯ Watch the full episode below:

 

 

 

The problem: scraping data that doesn't exist in traditional databases

Vaulted's starting point for Coworks was unusually creative. Because there's no off-the-shelf list of coworking spaces, no report you can buy, no Apollo export that gives you this audience, they scraped Google Maps via API to find operators in major metro areas across the US.

That got them a working dataset. But it created a secondary problem immediately.

When you're pulling records from Google Maps by geography, and your target audience sometimes operates multiple locations under one brand, you end up with duplicate entries. Some coworking groups had one domain for all their locations. Others had separate subdomains per city. A few had entirely different domains for each site, same operator, different URLs.

HubSpot would have no way of knowing those were the same company. Without custom matching logic, you'd end up with Raleigh and Charlotte as entirely separate records, associated with a single business you were trying to build a relationship with.

 

The fix: custom deduplication rules and a human review layer

Matthew's approach was to lean on Koalify's custom matching rules to compensate for the missing domain data.

Rather than relying on domain matching alone, Vaulted built rules around the identifiers they did have: location data, space names, and other custom properties they'd populated during the scrape. Where records were close enough to be flagged as probable duplicates but too ambiguous to merge automatically, they routed them to manual review.

That two layer approach, automated flagging followed by human confirmation, was important. When you're working with scraped data in a live client account, a false merge is hard to undo and easy to make. Matthew was deliberate about not over automating early in the process.

They also staged the rollout. Rather than scraping every metro area and importing everything at once, they ran the first month as a validation exercise. They scraped a small geography, reviewed the results with Coworks, and confirmed the data was accurate before expanding. Only once both teams trusted the output did they scale up.

That confidence building phase slowed things down at the start. It also meant the data they eventually relied on for marketing was actually reliable.

custom deduplication rules and a human review layer based on Address

 

The downstream impact: 35% of deals sourced from organic

The data work unlocked a content strategy that wouldn't have been possible otherwise.

With clean, trusted records in HubSpot, coworking operators segmented by location, matched and deduped, enriched over time with additional attributes like number of locations, Vaulted ran a sustained AEO and SEO programme for Coworks. The conversations they'd started with proto coworking operators during the data validation phase directly informed the content. They knew what questions the audience was asking, what problems they were facing before they were even thinking about software.

The results: a 37 to 40% increase in organic traffic, and more meaningfully, 35% of deals sourced from organic by the time the programme matured.

Matthew is clear about the sequencing: "Projects like these don't become a reality if the data is in a state where adding more only makes things worse. It's like throwing fuel into a fire."

The deduplication work wasn't the headline deliverable. It was the prerequisite for everything else.

37-percent-increase-in-traffic

 

What makes this case unusual and what it illustrates generally

Most HubSpot deduplication problems look like: too many imports, too many integrations, duplicate contacts accumulating over time. The fix is usually some combination of custom rules, bulk merging, and workflow automation to catch new ones as they arrive.

The Coworks case is different. The data wasn't dirty because of neglect or uncontrolled integrations. It was scraped intentionally, from an unusual source, to reach an audience that traditional databases couldn't find. The duplication risk was baked into the method.

But the underlying principle is the same one that comes up in almost every client engagement Matthew described: you can't do the interesting marketing work until the CRM fundamentals are in place. Lead scoring, personalised sequences, multi location targeting, none of it works if the records underneath are unreliable.

The specific lesson from Coworks is worth taking seriously if you're ever deduplicating against an unusual identifier. HubSpot's defaults are built around domain matching because domain is usually the most reliable company key. When it isn't, because your audience doesn't have domains yet or has inconsistent domain structures, you need to define your own primary key and build your matching rules around that instead. Koalify's custom property matching made that possible here. Without it, every new data import would have compounded the problem rather than built on a clean foundation.

 

Lessons from Matthew's approach

A few things stood out from how Vaulted handled this project that apply beyond the Coworks context.

Test before you scale. Running a small geography first and validating the data with the client before expanding saved them from a much larger deduplication problem later. It also built the internal confidence needed to keep going.

Define your primary key before you import. If you're working with data from non standard sources, scraped lists, custom integrations, or manual imports, decide upfront what property you'll use to identify uniqueness. Building your deduplication rules around that property from day one is much easier than retrofitting them later.

Automate what you're confident in, manually review what you're not. Not everything can or should be merged automatically. When the data is ambiguous, especially early in a project, a human review step is worth the friction. A bad merge in a live client account is harder to fix than it looks.

Data work is marketing work. The 35% organic deal sourcing didn't come from a content strategy in isolation. It came from the conversations that started during data validation, which shaped the content, which built a real audience. The data and the marketing were the same project.

 

FAQ

Can Koalify deduplicate HubSpot records without a company domain?
Yes. Koalify's custom matching rules let you define which properties to use as your deduplication identifiers. It doesn't have to be domain. You can match on any HubSpot property or combination of properties, including custom fields you've created yourself. This is particularly useful for audiences like early stage businesses, nonprofits, or anyone where domain matching is unreliable.

What happens when records are close but not certain duplicates?
Koalify flags probable duplicates based on your rules and surfaces them for review, either in bulk or via the CRM card on the individual record. You can set confidence thresholds and choose which matches to merge automatically versus send to manual review. The Vaulted approach of routing ambiguous matches to human confirmation before merging is a sensible pattern for high stakes or novel data sources.

Is this approach only relevant for scraped data?
No. The same challenge, unreliable domain matching, comes up regularly with data enrichment tools like Clay or Apollo, which create new company records that may or may not have the same domain as existing ones. It also comes up with certain integration sources like Aircall, Dialpad, or Gong, and with companies that operate under multiple brands or hold multiple domains. If your deduplication is failing and domain based matching is your only rule, it's worth reviewing whether domain is actually the right key for your specific data.


Vaulted is a growth and creative agency based in Raleigh. If you're a HubSpot Solution Partner dealing with recurring deduplication challenges across client portals, the Koalify partner programme covers all object types and includes a free partner account.