GTM Resource Hub

The GTM Engineer's Guide to Waterfall Enrichment

Every GTM team runs into the same problem eventually: no single data provider gives you everything you need. ZoomInfo has strong firmographic coverage but misses direct dials for SMBs.

Guest Writer at Octave

Apr 02, 2026

Updated

Try Octave → Book Demo

Guest

Writer at Octave

Our writers at Octave write about a variety of topics, spanning everything from go-to-market engineering guides to thought leadership on the future of AI and GTM.

Overview

Every GTM team runs into the same problem eventually: no single data provider gives you everything you need. ZoomInfo has strong firmographic coverage but misses direct dials for SMBs. Apollo covers emails well but has gaps in technographic data. Clearbit nails company data but struggles with contact-level signals. The solution is not to pick the "best" provider. The solution is to stack them in a waterfall, where each provider fills the gaps the previous one left behind.

Waterfall enrichment is the practice of routing a contact or account record through multiple data providers in a prioritized sequence, stopping at each stage only when the required fields are populated. It is the single most reliable way to maximize data coverage while controlling cost. This guide covers how to design your enrichment cascade, build fallback logic that actually works, implement quality scoring per provider, and keep the whole system from bleeding credits. If you are a GTM Engineer responsible for the data quality that feeds your team's outbound automation, this is foundational infrastructure.

Why a Single Provider Always Falls Short

The data enrichment market is fragmented for a reason. Different providers source data differently: some crawl the web, some license data from HR systems, some use crowdsourced contributor networks, and some rely on proprietary algorithms to infer attributes. This means each provider has structural strengths and structural blind spots. They are not interchangeable, and no amount of vendor salesmanship changes that.

Here is what a typical single-provider gap analysis looks like for a B2B SaaS team targeting mid-market accounts:

Data Field	Provider A Coverage	Provider B Coverage	Provider C Coverage	Waterfall Coverage
Work Email	85%	72%	68%	95%
Direct Dial	35%	52%	41%	72%
Job Title	90%	88%	79%	97%
Company Revenue	78%	65%	82%	93%
Technographics	45%	30%	71%	82%
Funding Data	60%	55%	73%	88%

The waterfall column tells the story. By cascading through three providers in sequence, you close the coverage gap on nearly every field. For fields like direct dials, where even the best provider only covers a third of your records, the waterfall approach can double your fill rate. That directly translates into more reachable prospects, better prospect research, and higher sequence enrollment rates.

The Cost of Incomplete Data

Missing data is not just an inconvenience. It cascades through your entire GTM operation. Without accurate job titles, your ICP scoring breaks. Without valid emails, your sequences bounce and your domain reputation suffers. Without company revenue data, your segmentation model guesses wrong and your reps waste time on accounts that will never close. Waterfall enrichment is not a nice-to-have optimization. It is a prerequisite for every downstream GTM workflow that depends on data quality.

Designing Your Enrichment Cascade

A waterfall is only as good as its design. The order of providers matters, the fields you check at each stage matter, and the logic that determines whether to continue or stop matters. Get these wrong and you either burn credits on redundant lookups or stop too early and miss data you could have captured.

Provider Ordering Strategy

The first provider in your waterfall should be the one with the best combination of coverage and accuracy for your primary target segment. Not the cheapest provider. Not the one with the biggest logo. The one that gives you the most correct data on the first try for the accounts your team actually sells into.

Run a coverage audit. Take a sample of 500-1,000 records from your CRM. Run them through each provider independently and measure fill rates per field. This gives you empirical data on which provider covers your specific market best, not generic marketing claims.

Measure accuracy, not just coverage. A provider that fills 90% of email fields but 20% of those emails bounce is worse than one that fills 70% with a 2% bounce rate. Validate a random sample from each provider against known-good data. Track accuracy by field, not just overall.

Factor in cost per credit. Once you know coverage and accuracy, calculate the effective cost per accurate data point for each provider. This is your ordering metric: provider with the lowest cost-per-accurate-fill goes first, second-lowest goes second, and so on.

Consider field-level specialization. Some providers are best-in-class for specific fields. You might use Provider A first for emails and titles, then Provider B specifically for phone numbers and technographics. Your waterfall can branch by field, not just cascade linearly.

Fallback Logic Architecture

The fallback logic is the conditional engine that decides whether to advance to the next provider. The simplest version checks whether the target fields are null after each provider call. But simple null checks miss important cases.

Your fallback logic should evaluate three conditions at each stage:

Field presence -- Is the field populated at all? Null, empty string, and "N/A" should all be treated as missing.
Field validity -- Does the value pass basic format checks? An email address should match email regex. A phone number should have the right digit count. Revenue should be a number, not "undisclosed."
Field confidence -- If the provider returns a confidence score, apply a threshold. A 40% confidence email is functionally the same as no email. Set minimum confidence thresholds per field and treat anything below them as missing.

Field-Level vs. Record-Level Waterfalls

Most teams start with record-level waterfalls: send the whole record to Provider A, then the whole record to Provider B if anything is missing. The more efficient approach is field-level: only request the specific missing fields from Provider B. This cuts credit usage dramatically and is the approach teams like Clay power users adopt for enrichment at scale.

Quality Scoring Per Provider

Once your waterfall is running, you need a feedback loop that measures how each provider is actually performing over time. Provider data quality is not static. Coverage rates shift as providers gain or lose data sources, update their algorithms, or change their scraping infrastructure. A provider that was best-in-class six months ago may have degraded without you noticing.

Building a Provider Scorecard

Track these metrics per provider on a monthly cadence:

Metric	What It Measures	Target
Fill Rate	Percentage of requested fields returned with a value	70%+ for primary provider
Accuracy Rate	Percentage of returned values that are correct (validated)	90%+ for emails, 85%+ for others
Freshness	Average age of the data (how recently it was verified)	Under 90 days
Bounce Rate	Percentage of enriched emails that bounce on first send	Under 3%
Cost Per Accurate Fill	Credit cost divided by number of accurate, usable data points returned	Varies by provider tier
Incremental Lift	Additional coverage this provider adds beyond previous providers in the cascade	15%+ to justify the cost

The incremental lift metric is particularly important. If Provider C in your waterfall only adds 3% coverage beyond what Providers A and B already captured, the cost of maintaining that integration and paying for those credits may not be justified. Review this quarterly and remove or reorder providers that are not pulling their weight.

Automated Quality Checks

Do not wait for reps to tell you the data is bad. Build automated quality checks that run on every enriched batch:

Email validation -- Run every enriched email through a verification service before it enters your CRM or sequencer. Catch-all domains get flagged, not treated as verified.
Phone number formatting -- Validate country code, digit count, and carrier type. VoIP numbers may be less reliable for cold calls.
Title standardization -- Normalize job titles to your internal taxonomy. "VP Sales," "Vice President of Sales," and "VP, Revenue" should all map to the same persona category.
Revenue range validation -- Cross-reference reported revenue against employee count and industry benchmarks. A 10-person company reporting $500M in revenue is likely a data error.

Cost Control and Credit Management

Waterfall enrichment without cost controls is a fast way to blow your data budget. Every additional provider lookup costs credits, and those credits add up fast when you are enriching thousands of records per month. The goal is maximum coverage at minimum cost, not maximum coverage at any cost.

Credit Optimization Strategies

Pre-filter before enriching. Run basic ICP checks on whatever data you already have before spending credits on enrichment. If you can disqualify an account based on industry or employee count from your existing data, do not waste credits enriching contacts there.
Cache enriched data. Store enrichment results and check your cache before making a new provider call. If you enriched a contact 30 days ago and the data is still within your freshness threshold, skip the lookup. This alone can reduce credit usage by 20-30%.
Set field-level stop conditions. If you only need email and title for a specific workflow, do not pay to enrich phone, revenue, and technographics. Configure your waterfall to only request and cascade on the fields you actually need for the downstream use case.
Implement daily and monthly credit caps. Hard limits prevent runaway costs from misconfigured workflows or unexpected volume spikes. When you hit a cap, queue the remaining records for the next day rather than stopping the pipeline entirely.
Negotiate volume-based pricing. Most data providers offer significant discounts at higher commit levels. Once you have 3-6 months of usage data, use it to negotiate annual contracts with volume tiers that match your actual consumption patterns.

Budget Allocation Rule of Thumb

Allocate roughly 60% of your enrichment budget to your primary provider, 25% to your secondary, and 15% to your tertiary. If your tertiary provider consistently adds less than 10% incremental lift, consider dropping it and reallocating those credits to validation services instead. Track your outbound budget holistically, not just enrichment in isolation.

Implementation Patterns That Work

The architecture of your waterfall depends on your tooling. Here are the most common patterns GTM Engineers use in production.

Clay-Native Waterfall

Clay's built-in waterfall enrichment is the fastest path to a working cascade. You configure multiple enrichment providers as columns, define the priority order, and Clay handles the fallback logic automatically. The advantage is simplicity: no code, no external orchestration, and the fallback logic is visual and auditable. The limitation is that your waterfall lives inside Clay and cannot easily be triggered from external systems without using Clay's API.

Orchestration-Layer Waterfall

For teams that need their waterfall to be triggered from multiple entry points (CRM updates, form submissions, webhook events), building the cascade in an orchestration tool like Make or n8n gives you more flexibility. Each provider is an API call node, conditional branches handle the fallback logic, and the results are written back to your CRM and sequencer via the same automation. This approach requires more setup but scales to more complex workflows.

Custom API Wrapper

Enterprise teams with engineering resources often build a thin API layer that wraps all their data providers behind a single endpoint. The caller sends a record and a list of desired fields, and the wrapper handles the cascade internally, returning a unified enriched record. This decouples your waterfall logic from any specific tool and makes it accessible to every system in your stack. It is more work upfront but pays off when you have dozens of workflows that all need enriched data.

Start Simple, Then Evolve

If you are building your first waterfall, start with Clay's native approach. You can stand up a working cascade in under an hour and start measuring provider performance immediately. Only move to an orchestration-layer or custom wrapper when you hit genuine limitations with the simpler approach. Over-engineering the infrastructure before you understand your data patterns is a common and costly mistake.

FAQ

How many providers should I include in my waterfall?

Two to three is the sweet spot for most teams. Beyond three providers, the incremental coverage gains typically drop below 5% per additional provider, while complexity and cost scale linearly. Run the numbers on incremental lift before adding a fourth provider. The exception is phone numbers, where coverage is structurally low across all providers and a fourth source may be justified.

Should I run the waterfall on every record or just the ones with missing data?

Only enrich records with missing fields. Running your full database through the waterfall every time wastes credits on records that are already complete. Use a pre-check that identifies which fields are missing per record and only routes those fields through the cascade. This is the field-level waterfall approach described above, and it is dramatically more efficient than record-level waterfalls.

How often should I re-enrich existing records?

It depends on the field. Job titles and company associations change frequently; re-enrich these every 90 days for active accounts and every 180 days for dormant ones. Firmographic data like revenue and employee count shifts more slowly; every 6-12 months is usually sufficient. Email addresses should be re-validated (not re-enriched) every 60-90 days to catch bounces before they hurt your sender reputation. See our detailed guide on re-enrichment cadence.

What happens when two providers return conflicting data for the same field?

Default to the provider with the higher historical accuracy rate for that specific field. If both providers have similar accuracy, prefer the provider that returns a higher confidence score for that specific record. Never average or merge conflicting values; pick one source of truth per field based on empirical performance data, not assumptions.

What Changes at Scale

Running a waterfall for 200 records a week is straightforward. Running one for 20,000 records a week across multiple segments, territories, and use cases is a different challenge entirely. The cascade logic stays the same, but everything around it gets harder: provider rate limits start throttling your throughput, credit budgets blow out if you are not careful, data conflicts multiply, and the feedback loop that tells you which providers are performing breaks when no one has time to review the scorecards.

What teams need at that point is not a better spreadsheet for tracking enrichment. They need a context layer that sits above their data providers and manages the entire enrichment lifecycle: triggering enrichment based on pipeline events, routing records through the optimal cascade based on segment-specific provider performance, resolving data conflicts automatically using learned accuracy scores, and writing the enriched data back to every system that needs it without custom integrations per tool.

Octave is an AI platform designed to automate and optimize your outbound playbook, and enrichment is a core part of its workflow. Octave's Enrich Company and Enrich Person Agents pull data with product fit scores, and its Clay integration (via API key and Agent ID) enables at-scale orchestration of enrichment workflows. The enriched data flows into Octave's Library, which stores your ICP context, personas, and qualifying questions, and directly feeds the Sequence Agent and Qualify Agent. For teams running enrichment at volume, Octave connects the enrichment output to immediate outbound action rather than leaving it stranded in a database.

Conclusion

Waterfall enrichment is not optional for serious GTM operations. Single-provider reliance guarantees data gaps, and data gaps guarantee broken workflows downstream. The GTM Engineer's job is to build the cascade that maximizes coverage, validate the quality of what comes back, and keep costs under control as volume grows.

Start with a coverage audit across your existing providers. Measure fill rates and accuracy rates per field for your specific target market. Order your providers by cost-per-accurate-fill and build the fallback logic with field-level granularity. Implement automated quality checks on every enriched batch. Track provider performance monthly and reorder or replace providers that are not delivering incremental value. The enrichment waterfall is living infrastructure. Build it, measure it, and iterate on it relentlessly.

FAQ

Frequently Asked Questions

Still have questions? Get connected to our support team.

Get Started

Build your generative GTM motion today

Try for free