Overview
A single prospect interacts with your company across a dozen touchpoints — they click an ad on their phone, visit your website on their laptop, open an email at work, attend a webinar with a personal email, and download a whitepaper using a third address entirely. Each interaction creates a data point in a different system with a different identifier. Without identity resolution, that is five separate anonymous or semi-known records. With identity resolution, it is one unified profile with a complete picture of intent, engagement, and fit.
Identity resolution is the process of linking these fragmented records into a single, unified view of a person or an account. For GTM Engineers, it is the foundation that makes cross-system analytics, accurate scoring, and personalized outreach possible. Without it, every system in your stack has a partial view — and decisions based on partial views produce partial results. This guide covers how identity resolution works, how to build an identity graph for your GTM stack, and the practical tradeoffs between different resolution approaches.
Why Identity Resolution Matters for GTM
The identity problem is not theoretical. It has direct, measurable impact on every GTM workflow.
The Fragmentation Problem
Consider a typical B2B prospect journey:
- Visits your website anonymously three times (tracked by cookie in your web analytics)
- Downloads a guide using their personal Gmail address (creates a record in your marketing automation)
- Gets prospected by your SDR team and added to the CRM with their work email
- Responds to a cold email from a different work email alias
- Attends a webinar and registers with a third email address
- Has a phone call with an AE (logged in the CRM with yet another phone number)
Without identity resolution, your marketing team sees a lukewarm content lead. Your SDR sees a cold outbound prospect. Your AE sees a new contact. None of them see the highly engaged person who has interacted with your company six times across four channels. Your engagement scoring is wrong because the score is spread across multiple records. Your personalization is wrong because each system has an incomplete view of their interests and intent.
| GTM Function | Without Identity Resolution | With Identity Resolution |
|---|---|---|
| Lead scoring | Scores fragmented across records; hot leads look lukewarm | Unified score reflects total engagement across all touchpoints |
| Personalization | Each channel operates on partial context | Every touchpoint has the full picture of interests and behavior |
| Attribution | First-touch or last-touch only; multi-touch impossible | Complete journey mapped from anonymous visit to closed deal |
| Account-based motions | Cannot aggregate individual engagement to the account level reliably | True account-level engagement scoring across all contacts |
| Sales context | Reps see only CRM history, miss marketing and product interactions | Reps see every interaction, every signal, in one view |
Building an Identity Graph
An identity graph is a data structure that maps every known identifier for a person or account to a single canonical identity. It is the core data asset that powers identity resolution.
Identifier Types
Every identifier has different properties in terms of persistence, uniqueness, and availability:
| Identifier | Persistence | Uniqueness | Availability |
|---|---|---|---|
| Work email | Medium (changes with jobs) | High | High in B2B contexts |
| Personal email | High | High | Medium (gated content, webinars) |
| Phone number | Medium | High | Low (often missing) |
| LinkedIn URL | High | Very high | Medium (enrichment required) |
| CRM record ID | High (within system) | Unique per system | Only in CRM |
| Browser cookie | Low (clears, expires) | Medium (shared devices) | High (web only) |
| IP address | Low | Low (shared, dynamic) | High |
| Device fingerprint | Medium | Medium | High (web/mobile) |
Graph Construction
The identity graph connects identifiers through observed relationships. When a user logs in with their work email on the same browser that previously had only an anonymous cookie, the graph links the cookie to the email. When an enrichment provider returns a LinkedIn URL for that email, the graph adds another link. Over time, the graph accumulates connections:
- Direct links: Two identifiers observed in the same session, form submission, or system record. High confidence.
- Inferred links: Two identifiers connected through a chain of direct links. A cookie links to email A, and email A links to LinkedIn URL B. The cookie and LinkedIn URL are indirectly linked. Lower confidence but often valid.
- Transitive links: Connections that span multiple hops in the graph. These require careful handling because errors compound — a single wrong link can merge two different people's identities. Set a maximum hop distance (usually 2) for automatic resolution.
The biggest risk in identity resolution is over-merging — linking records that belong to different people. This happens when shared identifiers (shared devices, role-based email addresses like info@company.com, or shared IP addresses) incorrectly bridge two distinct identities. Always validate merge decisions against multiple signals. A shared cookie + same company domain is plausible. A shared IP address alone is not. Build confidence scoring into your resolution logic and set thresholds that favor precision over recall — a missed link creates a gap in your data, but a wrong merge corrupts two records.
Resolution Approaches
There are two fundamental approaches to identity resolution, and your choice depends on your data, your risk tolerance, and your technical infrastructure.
Deterministic Resolution
Deterministic resolution links records only when they share a known, high-confidence identifier — typically an email address or phone number. If two records share the same email, they are the same person. No scoring, no probability, no ambiguity.
This approach is safe and auditable but has significant gaps. It cannot resolve anonymous website visitors to known contacts (until they identify themselves), cannot link personal and work email addresses, and cannot match records that have no shared identifier. For most GTM teams, deterministic resolution alone resolves 50-60% of identity links.
Probabilistic Resolution
Probabilistic resolution uses multiple signals — behavioral patterns, device characteristics, temporal proximity, and fuzzy matching on name and company — to infer identity links with a confidence score. Two records that share the same company domain, have similar first names, and show overlapping browsing patterns are probably the same person, even without a shared email.
This approach catches links that deterministic methods miss but introduces false positives. The key is calibrating confidence thresholds:
- Auto-resolve (confidence >90%): Deterministic matches plus high-confidence probabilistic matches (e.g., same phone number + same company + similar name)
- Suggest for review (60-90%): Moderate-confidence probabilistic matches flagged for human verification
- Reject (<60%): Low-confidence matches that are more likely wrong than right
Hybrid Approach
The practical approach is to layer deterministic and probabilistic resolution:
Cross-Channel and Cross-Device Identity
The hardest identity resolution challenge is linking the same person across different channels and devices. A prospect who visits your website on mobile, opens your email on desktop, and clicks a LinkedIn ad on their work laptop generates three separate device fingerprints with no obvious connection.
Channel-Specific Identity Challenges
- Web to email: Anonymous web visitors become known contacts when they fill out a form, click an email link with tracking parameters, or log into your product. UTM parameters and click-through tracking are the primary bridges.
- Email to CRM: Marketing automation contacts and CRM contacts are often separate records. Configure your MAP-CRM sync to use email as the identity key and enforce matching before creating new CRM records.
- Social to web: LinkedIn ad clicks and social engagements are difficult to link to specific contacts. Use UTM parameters, landing page forms, and multi-channel tracking to bridge social interactions to known identities.
- Product to CRM: Product usage data is tied to user accounts, which may use different email addresses than the CRM contact. Link product user IDs to CRM contacts through a shared identifier (typically email) during onboarding or through enrichment.
Account-Level Identity
In B2B, identity resolution extends beyond individuals to accounts. Multiple contacts from the same company need to be grouped under a unified account identity. This requires:
- Domain-to-account mapping: Link email domains to account records. Handle edge cases like gmail.com, outlook.com, and shared-domain companies (subsidiaries using parent company domain).
- IP-to-account resolution: Map corporate IP ranges to account records for de-anonymizing website traffic. Tools like Clearbit Reveal and 6sense provide this capability.
- Contact-to-account linking: Ensure every contact is properly associated with their account, even when they use personal emails or external domains. Enrichment providers can help map contacts to companies using firmographic data.
Building Unified Profiles
Once your identity graph links fragmented records together, the next step is building unified profiles that your GTM systems can actually use.
Profile Assembly
A unified profile aggregates data from every linked record into a single, coherent view:
- Attribute selection: For conflicting attributes (different job titles from different sources), apply source-priority rules. The most recently enriched value from your most trusted source wins.
- Activity aggregation: Combine engagement data from all linked records — emails, calls, website visits, product events, ad interactions — into a single chronological timeline.
- Score consolidation: Recalculate engagement and fit scores using the full, unified activity history and attribute set. A profile that appeared lukewarm across three fragmented records may score as highly engaged when consolidated.
Profile Distribution
Unified profiles are only valuable if they are accessible in the systems where your team works. Push unified profile data to:
- Your CRM as enriched contact and account records
- Your sequencer as context for personalized messaging
- Your marketing automation platform for targeted campaigns
- Your analytics warehouse for reporting and attribution
Identity resolution aggregates data about individuals across systems, which raises privacy concerns. Ensure your resolution process complies with GDPR, CCPA, and other applicable regulations. Respect opt-out signals across all linked records — if a person unsubscribes from one email address, that unsubscribe must propagate to all linked identities. Build consent tracking into your identity graph so you can answer "what data do we have about this person and where did it come from?" for any individual.
FAQ
Deduplication finds and merges duplicate records within a single system. Identity resolution links records across multiple systems and identifier types to build a unified view. Dedup answers "are these two CRM records the same person?" Identity resolution answers "are this CRM contact, this marketing lead, this product user, and this anonymous website visitor all the same person?" Identity resolution is the broader problem; deduplication is one component of it.
You can build basic identity resolution (deterministic matching on email across 2-3 systems) with custom code or a tool like Make. Probabilistic resolution, graph management, and cross-device identity at scale require dedicated infrastructure. If your stack has fewer than 5 systems, build it yourself. Beyond that, evaluate platforms like Segment, Amperity, or a unified context platform that handles identity as a core function.
A job change creates a fork in the identity graph. The person's identity persists (same LinkedIn, possibly same phone, same personal email), but their account association changes. Detect job changes through enrichment signals (new company domain on LinkedIn, email bounce from old domain) and update the graph: keep the person node, change the account edge. This preserves their engagement history while correctly associating them with their new company for job-change outreach.
Aim for 95%+ precision (links that are correct) even if it means lower recall (links that are found). In GTM, the cost of a wrong merge (corrupting two records, sending incorrect messaging) is far higher than the cost of a missed link (having an incomplete profile). Measure precision by sampling resolved identities and manually verifying them. If your precision drops below 90%, tighten your matching thresholds.
What Changes at Scale
Identity resolution at small scale is straightforward — a few thousand contacts across three systems, matched on email. At enterprise scale — millions of identifiers across dozens of touchpoints, with real-time resolution requirements — it becomes one of the hardest infrastructure problems in GTM.
The graph grows in complexity, not just size. Every new system you connect adds a new identifier type and new linking rules. Every marketing channel creates new anonymous touchpoints that need to be resolved. Every acquisition or territory expansion adds contacts that may overlap with existing records. Maintaining the graph — resolving new identifiers, propagating updates, handling splits (one identity turns out to be two people), and enforcing privacy rules — requires dedicated infrastructure and continuous attention.
This is where Octave adds value. Octave is an AI platform that automates and optimizes your outbound playbook by connecting to your existing GTM stack. Its Enrich Agent provides company and person data with product fit scores, helping resolve and consolidate prospect identities through consistent enrichment. The Library centralizes your ICP context, personas, and reference customers, so every outreach decision draws from a single source of truth about who your prospects are and what matters to them. Octave's Runtime Context maintains prospect-specific data per person, ensuring that every agent interaction -- from qualification to sequence generation to call prep -- operates on a complete, unified view of each contact.
Conclusion
Identity resolution is the missing layer between data collection and data activation. Without it, your systems see fragments. With it, they see people and accounts in full context. Build your resolution strategy with deterministic matching as the foundation, probabilistic matching for coverage, and continuous refinement as new data arrives. Invest in a proper identity graph that maps every identifier to a canonical identity, and push unified profiles to every system in your stack. The teams that solve identity resolution unlock the full value of their GTM data. The ones that do not are making decisions with incomplete pictures and wondering why their metrics do not match reality.
