GTM Resource Hub

The GTM Engineer's Guide to Identity Resolution

Guest Writer at Octave

Apr 02, 2026

Updated

Guest

Writer at Octave

Our writers at Octave write about a variety of topics, spanning everything from go-to-market engineering guides to thought leadership on the future of AI and GTM.

Overview

A single prospect interacts with your company across a dozen touchpoints — they click an ad on their phone, visit your website on their laptop, open an email at work, attend a webinar with a personal email, and download a whitepaper using a third address entirely. Each interaction creates a data point in a different system with a different identifier. Without identity resolution, that is five separate anonymous or semi-known records. With identity resolution, it is one unified profile with a complete picture of intent, engagement, and fit.

Identity resolution is the process of linking these fragmented records into a single, unified view of a person or an account. For GTM Engineers, it is the foundation that makes cross-system analytics, accurate scoring, and personalized outreach possible. Without it, every system in your stack has a partial view — and decisions based on partial views produce partial results. This guide covers how identity resolution works, how to build an identity graph for your GTM stack, and the practical tradeoffs between different resolution approaches.

Why Identity Resolution Matters for GTM

The identity problem is not theoretical. It has direct, measurable impact on every GTM workflow.

The Fragmentation Problem

Consider a typical B2B prospect journey:

Visits your website anonymously three times (tracked by cookie in your web analytics)
Downloads a guide using their personal Gmail address (creates a record in your marketing automation)
Gets prospected by your SDR team and added to the CRM with their work email
Responds to a cold email from a different work email alias
Attends a webinar and registers with a third email address
Has a phone call with an AE (logged in the CRM with yet another phone number)

Without identity resolution, your marketing team sees a lukewarm content lead. Your SDR sees a cold outbound prospect. Your AE sees a new contact. None of them see the highly engaged person who has interacted with your company six times across four channels. Your engagement scoring is wrong because the score is spread across multiple records. Your personalization is wrong because each system has an incomplete view of their interests and intent.

GTM Function	Without Identity Resolution	With Identity Resolution
Lead scoring	Scores fragmented across records; hot leads look lukewarm	Unified score reflects total engagement across all touchpoints
Personalization	Each channel operates on partial context	Every touchpoint has the full picture of interests and behavior
Attribution	First-touch or last-touch only; multi-touch impossible	Complete journey mapped from anonymous visit to closed deal
Account-based motions	Cannot aggregate individual engagement to the account level reliably	True account-level engagement scoring across all contacts
Sales context	Reps see only CRM history, miss marketing and product interactions	Reps see every interaction, every signal, in one view

Building an Identity Graph

An identity graph is a data structure that maps every known identifier for a person or account to a single canonical identity. It is the core data asset that powers identity resolution.

Identifier Types

Every identifier has different properties in terms of persistence, uniqueness, and availability:

Identifier	Persistence	Uniqueness	Availability
Work email	Medium (changes with jobs)	High	High in B2B contexts
Personal email	High	High	Medium (gated content, webinars)
Phone number	Medium	High	Low (often missing)
LinkedIn URL	High	Very high	Medium (enrichment required)
CRM record ID	High (within system)	Unique per system	Only in CRM
Browser cookie	Low (clears, expires)	Medium (shared devices)	High (web only)
IP address	Low	Low (shared, dynamic)	High
Device fingerprint	Medium	Medium	High (web/mobile)

Graph Construction

The identity graph connects identifiers through observed relationships. When a user logs in with their work email on the same browser that previously had only an anonymous cookie, the graph links the cookie to the email. When an enrichment provider returns a LinkedIn URL for that email, the graph adds another link. Over time, the graph accumulates connections:

Direct links: Two identifiers observed in the same session, form submission, or system record. High confidence.
Inferred links: Two identifiers connected through a chain of direct links. A cookie links to email A, and email A links to LinkedIn URL B. The cookie and LinkedIn URL are indirectly linked. Lower confidence but often valid.
Transitive links: Connections that span multiple hops in the graph. These require careful handling because errors compound — a single wrong link can merge two different people's identities. Set a maximum hop distance (usually 2) for automatic resolution.

The Over-Merge Problem

The biggest risk in identity resolution is over-merging — linking records that belong to different people. This happens when shared identifiers (shared devices, role-based email addresses like info@company.com, or shared IP addresses) incorrectly bridge two distinct identities. Always validate merge decisions against multiple signals. A shared cookie + same company domain is plausible. A shared IP address alone is not. Build confidence scoring into your resolution logic and set thresholds that favor precision over recall — a missed link creates a gap in your data, but a wrong merge corrupts two records.

Resolution Approaches

There are two fundamental approaches to identity resolution, and your choice depends on your data, your risk tolerance, and your technical infrastructure.

Deterministic Resolution

Deterministic resolution links records only when they share a known, high-confidence identifier — typically an email address or phone number. If two records share the same email, they are the same person. No scoring, no probability, no ambiguity.

This approach is safe and auditable but has significant gaps. It cannot resolve anonymous website visitors to known contacts (until they identify themselves), cannot link personal and work email addresses, and cannot match records that have no shared identifier. For most GTM teams, deterministic resolution alone resolves 50-60% of identity links.

Probabilistic Resolution

Probabilistic resolution uses multiple signals — behavioral patterns, device characteristics, temporal proximity, and fuzzy matching on name and company — to infer identity links with a confidence score. Two records that share the same company domain, have similar first names, and show overlapping browsing patterns are probably the same person, even without a shared email.

This approach catches links that deterministic methods miss but introduces false positives. The key is calibrating confidence thresholds:

Auto-resolve (confidence >90%): Deterministic matches plus high-confidence probabilistic matches (e.g., same phone number + same company + similar name)
Suggest for review (60-90%): Moderate-confidence probabilistic matches flagged for human verification
Reject (<60%): Low-confidence matches that are more likely wrong than right

Hybrid Approach

The practical approach is to layer deterministic and probabilistic resolution:

Start deterministic. Link all records that share exact email, phone, or LinkedIn URL. This establishes the high-confidence backbone of your identity graph.

Layer probabilistic. For unresolved records, apply probabilistic matching using name + company + behavioral signals. Set thresholds conservatively and review matches before committing.

Continuous refinement. As new data arrives (a previously anonymous visitor fills out a form), update the graph. Each new data point can confirm or invalidate existing probabilistic links. Design your graph to absorb new evidence and self-correct.

Cross-Channel and Cross-Device Identity

The hardest identity resolution challenge is linking the same person across different channels and devices. A prospect who visits your website on mobile, opens your email on desktop, and clicks a LinkedIn ad on their work laptop generates three separate device fingerprints with no obvious connection.

Channel-Specific Identity Challenges

Web to email: Anonymous web visitors become known contacts when they fill out a form, click an email link with tracking parameters, or log into your product. UTM parameters and click-through tracking are the primary bridges.
Email to CRM: Marketing automation contacts and CRM contacts are often separate records. Configure your MAP-CRM sync to use email as the identity key and enforce matching before creating new CRM records.
Social to web: LinkedIn ad clicks and social engagements are difficult to link to specific contacts. Use UTM parameters, landing page forms, and multi-channel tracking to bridge social interactions to known identities.
Product to CRM: Product usage data is tied to user accounts, which may use different email addresses than the CRM contact. Link product user IDs to CRM contacts through a shared identifier (typically email) during onboarding or through enrichment.

Account-Level Identity

In B2B, identity resolution extends beyond individuals to accounts. Multiple contacts from the same company need to be grouped under a unified account identity. This requires:

Domain-to-account mapping: Link email domains to account records. Handle edge cases like gmail.com, outlook.com, and shared-domain companies (subsidiaries using parent company domain).
IP-to-account resolution: Map corporate IP ranges to account records for de-anonymizing website traffic. Tools like Clearbit Reveal and 6sense provide this capability.
Contact-to-account linking: Ensure every contact is properly associated with their account, even when they use personal emails or external domains. Enrichment providers can help map contacts to companies using firmographic data.

Building Unified Profiles

Once your identity graph links fragmented records together, the next step is building unified profiles that your GTM systems can actually use.

Profile Assembly

A unified profile aggregates data from every linked record into a single, coherent view:

Attribute selection: For conflicting attributes (different job titles from different sources), apply source-priority rules. The most recently enriched value from your most trusted source wins.
Activity aggregation: Combine engagement data from all linked records — emails, calls, website visits, product events, ad interactions — into a single chronological timeline.
Score consolidation: Recalculate engagement and fit scores using the full, unified activity history and attribute set. A profile that appeared lukewarm across three fragmented records may score as highly engaged when consolidated.

Profile Distribution

Unified profiles are only valuable if they are accessible in the systems where your team works. Push unified profile data to:

Your CRM as enriched contact and account records
Your sequencer as context for personalized messaging
Your marketing automation platform for targeted campaigns
Your analytics warehouse for reporting and attribution

Privacy Considerations

Identity resolution aggregates data about individuals across systems, which raises privacy concerns. Ensure your resolution process complies with GDPR, CCPA, and other applicable regulations. Respect opt-out signals across all linked records — if a person unsubscribes from one email address, that unsubscribe must propagate to all linked identities. Build consent tracking into your identity graph so you can answer "what data do we have about this person and where did it come from?" for any individual.

FAQ

How is identity resolution different from deduplication?

Deduplication finds and merges duplicate records within a single system. Identity resolution links records across multiple systems and identifier types to build a unified view. Dedup answers "are these two CRM records the same person?" Identity resolution answers "are this CRM contact, this marketing lead, this product user, and this anonymous website visitor all the same person?" Identity resolution is the broader problem; deduplication is one component of it.

Do we need a dedicated identity resolution platform, or can we build this ourselves?

You can build basic identity resolution (deterministic matching on email across 2-3 systems) with custom code or a tool like Make. Probabilistic resolution, graph management, and cross-device identity at scale require dedicated infrastructure. If your stack has fewer than 5 systems, build it yourself. Beyond that, evaluate platforms like Segment, Amperity, or a unified context platform that handles identity as a core function.

How do we handle identity resolution when a contact changes companies?

A job change creates a fork in the identity graph. The person's identity persists (same LinkedIn, possibly same phone, same personal email), but their account association changes. Detect job changes through enrichment signals (new company domain on LinkedIn, email bounce from old domain) and update the graph: keep the person node, change the account edge. This preserves their engagement history while correctly associating them with their new company for job-change outreach.

What accuracy rate should we target for identity resolution?

Aim for 95%+ precision (links that are correct) even if it means lower recall (links that are found). In GTM, the cost of a wrong merge (corrupting two records, sending incorrect messaging) is far higher than the cost of a missed link (having an incomplete profile). Measure precision by sampling resolved identities and manually verifying them. If your precision drops below 90%, tighten your matching thresholds.

What Changes at Scale

Identity resolution at small scale is straightforward — a few thousand contacts across three systems, matched on email. At enterprise scale — millions of identifiers across dozens of touchpoints, with real-time resolution requirements — it becomes one of the hardest infrastructure problems in GTM.

The graph grows in complexity, not just size. Every new system you connect adds a new identifier type and new linking rules. Every marketing channel creates new anonymous touchpoints that need to be resolved. Every acquisition or territory expansion adds contacts that may overlap with existing records. Maintaining the graph — resolving new identifiers, propagating updates, handling splits (one identity turns out to be two people), and enforcing privacy rules — requires dedicated infrastructure and continuous attention.

This is where Octave adds value. Octave is an AI platform that automates and optimizes your outbound playbook by connecting to your existing GTM stack. Its Enrich Agent provides company and person data with product fit scores, helping resolve and consolidate prospect identities through consistent enrichment. The Library centralizes your ICP context, personas, and reference customers, so every outreach decision draws from a single source of truth about who your prospects are and what matters to them. Octave's Runtime Context maintains prospect-specific data per person, ensuring that every agent interaction -- from qualification to sequence generation to call prep -- operates on a complete, unified view of each contact.

Conclusion

Identity resolution is the missing layer between data collection and data activation. Without it, your systems see fragments. With it, they see people and accounts in full context. Build your resolution strategy with deterministic matching as the foundation, probabilistic matching for coverage, and continuous refinement as new data arrives. Invest in a proper identity graph that maps every identifier to a canonical identity, and push unified profiles to every system in your stack. The teams that solve identity resolution unlock the full value of their GTM data. The ones that do not are making decisions with incomplete pictures and wondering why their metrics do not match reality.

FAQ

Frequently Asked Questions

Still have questions? Get connected to our support team.

Get Started

Build your generative GTM motion today

Try for free