claude-skill

Auto-enrich leads with Clay + Claude

Difficulty

intermediate

Setup time

30min

For

revops · sdr-leader

RevOps

Stack

A Claude Skill (lead-enrichment) that takes a Clay table row keyed on domain and returns three derived fields in one structured pass: a 2-sentence company summary, an ICP fit score 1-10 with a one-line reason, and a sub-50-word cold-email opener that cites a specific signal from the company’s public footprint. The opener is always a draft for rep review — the skill refuses to be wired into an auto-send step.

The artifact bundle ships at apps/web/public/artifacts/lead-enrichment-clay-claude/ and contains the skill body (SKILL.md) plus three reference templates the operator fills in once and the skill loads on every run.

When to use

You have a Clay table (or a CSV of leads exportable to Clay) with at least a populated domain column, and you need to push enriched records into HubSpot, Salesforce, Attio, or a sequencer.
Volume is in the hundreds-to-tens-of-thousands per batch. Below that, hand-writing openers is faster. Above that, the cost discipline in this skill (ICP-gated opener generation, single-call extraction, per-host fetch caps) is what keeps the per-row spend sane.
Reps are willing to read the draft opener before it sends. The whole design assumes a human eyeball between draft and send, and degrades badly without one.
You want a single skill the team can call from both Clay AI columns and from a Claude Code session running the Anthropic Batch API over an exported list — the skill body is identical in both surfaces.

When NOT to use

Auto-send sequencers. If the plan is “wire opener_draft straight into the first sequence step”, stop. The opener carries an opener_confidence field for a reason: roughly one in five drafts needs a rewrite. Auto-sending them is the failure mode the skill is designed against.
Lists without a documented lawful basis to process. Scraped EU or UK contacts without prior consent, Quebec leads under Law 25, California consumer data without a CCPA pathway — the skill will enrich anything you point it at, and that is exactly why the operator must check first. The MDX page ships a “Do NOT invoke” block in SKILL.md covering this; don’t comment it out.
Account-level discovery briefs. Use the account-research skill instead. That one does deep persona mapping and pain hypotheses; this one optimizes for batch volume and cost-per-row.
Decisions that require licensed financial data (S&P, Pitchbook). This skill reads public web only and will not fabricate revenue figures from homepage copy.
Lists where the source data is junk. Garbage domains in, garbage drafts out. Run Clay’s domain validation column upstream; parked domains and 404s are skipped by the skill but you are still paying credits to find that out.

Setup

Drop the skill into your environment. For Claude.ai, import the SKILL.md from apps/web/public/artifacts/lead-enrichment-clay-claude/SKILL.md and upload the three files in references/. For Claude Code, copy the directory to ~/.claude/skills/lead-enrichment/.
Replace every {...} placeholder in references/1-icp-rubric.md with your team’s actual ICP. The skill detects unsubstituted placeholders and refuses to score against a template — score: null, reason: "rubric not configured" instead of guessing. This is intentional; a wrong rubric is worse than no rubric.
Edit references/2-opener-style-guide.md with your team’s voice and banned phrases. The defaults ban the obvious LLM tells (“I noticed”, “love what you’re doing”, any superlative); add company- specific bans as your reps flag them.
Edit references/3-source-quality-matrix.md to declare the preference order between whatever enrichment vendors are wired into your Clay table upstream of this skill. Without a declared order, the snapshot step flip-flops between Apollo and Clearbit run to run, and ICP scores drift.
In Clay, create three AI columns referencing the skill: summary, icp_fit_score, opener_draft. Map the inputs per the “Inputs” section of SKILL.md. Set the destination push to HubSpot (or your CRM) and route opener_draft into a sequence variable on a step that requires manual approval, not auto-send.
Run on a 20-row sample. Spot-check: do the snapshot facts trace to real homepage copy, do the ICP scores land in a sane distribution, do the openers pass the style guide. Tune the rubric and style guide before scaling up. The first 100 rows are calibration data; treat them that way.

What the skill actually does

Per row, four sub-tasks in order:

Resolve and fetch. https://{domain} with a 10-second timeout and one redirect hop, then best-effort /about, /company, /customers. Parked / 404 / empty-body domains return status: unreachable and skip the rest. Per-host concurrency cap is 2, with a 250ms minimum delay and a single back-off retry on 429.
Extract structured snapshot in one Claude call: industry, size_signal, value_prop, optional recent_signal with URL. Single call rather than three because round-trip count is the dominant cost-per-row driver at scale and the extraction prompt stays reliable in one pass.
Score against the ICP rubric. Loads references/1-icp-rubric.md, returns 1-10 with a one-line reason that names the rubric dimension that drove the score.
Generate opener — only if score clears the threshold (default 6/10). Hard rules in the prompt: 50-word cap, references exactly one fact from the snapshot, no superlatives, no invented company claims, no fake-pain question close. Returns opener_confidence 0.0-1.0; under 0.5 is flagged for rewrite.

The output of one row is a JSON block embedded in Markdown — Clay’s AI column parses it, and a human reading the run log can scan it. Full schema and a worked example are in the “Output format” section of apps/web/public/artifacts/lead-enrichment-clay-claude/SKILL.md.

Cost reality

There are two cost lines: Anthropic tokens and Clay credits.

Anthropic tokens at Sonnet pricing (input $3/MTok, output $15/MTok as of authoring date):

Steps 1+2 (fetch + extract): ~3.5K input tokens (homepage + about truncated to ~8K char) + ~250 output tokens. Roughly $0.014/row.
Step 3 (ICP score): ~1K input + ~80 output. Roughly $0.004/row.
Step 4 (opener, only when score >= threshold): ~1.2K input + ~120 output. Roughly $0.005/row.

So a row that clears the threshold lands around $0.023; a row that does not lands around $0.018. Run via the Anthropic Batch API for ~50% off when the workload is non-urgent (overnight enrichment of inbound MQLs is the textbook fit) — that knocks rows into the $0.01- 0.012 range.

At scale: a 100K-row weekly batch with ~40% threshold-clearing is ~$1,500-2,000/month on Anthropic before batch discount, ~$800-1,000 after.

Clay credits depend on which vendor columns are wired upstream. Apollo costs about 1 credit per resolved domain; Clearbit Reveal is 2-3; ZoomInfo (paid passthrough) is more. Stack three vendors and a single row can hit 8-12 Clay credits before the skill itself runs. The Starter plan ships 5K credits/month; the Pro plan 25K. A 100K-row batch under that vendor stack needs the Enterprise tier or a tighter matrix in references/3-source-quality-matrix.md. The matrix exists specifically to shed the lowest-ranked vendor when the per-row cost ceiling fires.

If the math feels rough, the leverage point is the ICP threshold. Raising it from 6 to 7 typically suppresses opener generation on 25- 35% more rows; raising it to 8 sheds another 20%. The skill logs the score distribution at the end of each batch so the operator can tune empirically rather than by hunch.

Success metric

Reply rate on the rep-reviewed openers, segmented by opener_confidence bucket. The point of the skill is not “more openers per hour” — it is “openers good enough that reps stop rewriting them from scratch”. Two sub-metrics worth instrumenting:

Rewrite rate — what fraction of opener_draft values does the rep send unchanged vs. rewrites materially. Target: under 35% on confidence 0.7+ rows after the first 500-row calibration. Higher means the style guide is wrong, the rubric is wrong, or the snapshot step is hallucinating.
Reply rate by confidence bucket. Reply rate on opener_confidence >= 0.7 should be at least 1.5x reply rate on the under-0.5 bucket. If they are similar, the confidence score is not signal — investigate before trusting it as a routing input.

vs alternatives

vs. Apollo’s native sequence personalization. Apollo will generate openers from its own enrichment data. It is faster to stand up but the openers are visibly templated, scored on Apollo’s ICP heuristic (not yours), and have no audit trail of which fact drove the draft. This skill takes longer to set up and costs more per row, but the openers reference dated URLs you can verify in one click and the rubric is a file you control.
vs. Clearbit + Outreach Smart Variables. Clearbit-fed Smart Variables produce factual mail-merge ("their industry is ${X}"), not openers — they need a human to write the actual sentence around the variable. Cheaper than this skill if your reps are already writing the sentences; more expensive overall if they aren’t, because rep time dominates token cost.
vs. manual opener writing. A senior SDR writes a high-quality cold opener in 4-7 minutes per account at ~$60/hour fully loaded — call it $5 of rep time per opener. The skill is at most ~$0.025 per opener. The catch: the rep also did some account-level thinking while writing. The skill does not. The right shape for most teams is the skill on top-of-funnel volume (everything under tier-1 account list) and rep-written openers on the named-account list.
vs. status quo (no enrichment, generic openers). Reply rates on generic openers run somewhere in the 1-2% range; lightly personalized openers tied to a recent signal run 4-8% in published benchmarks. The skill targets the latter. Worth doing only if the team is willing to stand up the rubric and style guide; without those, the skill outputs are not materially better than the status quo.

Watch-outs

Source-quality drift across vendors. When Apollo, Clearbit, and ZoomInfo all enrich the same row and disagree on headcount or industry, the snapshot step flip-flops between them run to run. Guard: references/3-source-quality-matrix.md declares preference order; the snapshot cites which vendor (or homepage value) it used per field, so drift is auditable in the per-row conflicts log.
Opener inventing claims that aren’t in the data. Without strict prompting, openers fabricate confident-sounding facts (“congrats on the Series C” with no Series C). Guard: the opener prompt receives the snapshot inline with an explicit “facts not in the snapshot are forbidden” rule; recent_signal carries a URL for one-click verification; openers under opener_confidence 0.5 are flagged for rewrite, never auto-sent.
Cost-per-row escalation when the ICP filter is loose. A rubric that scores most rows 7+ defeats the threshold gate; opener generation runs on every row and per-row cost rises 3-4x. Guard: the skill emits a score_distribution summary per batch; if more than 60% of a 1K-row sample lands 7+, the skill prints a warning and recommends tightening the rubric before the next batch.
Stale recent_signal. A signal extracted 90 days ago becomes a liability — reps writing “saw your March launch” in August read as asleep at the wheel. Guard: every record carries enriched_at; the Clay column is configured to re-run when older than 30 days; the opener step refuses to use a recent_signal whose URL date is more than 60 days behind enriched_at.
Consent and lawful basis. The skill enriches whatever you point it at. The “Do NOT invoke” block in SKILL.md exists to remind the operator to check the source list’s lawful basis before running. Don’t delete it.

Stack

Clay — table substrate, upstream enrichment vendor stack, destination CRM push. Starter plan supports the AI column primitive the skill plugs into; Pro is required for the credit volume of any serious batch.
Claude (Sonnet by default) — the inference layer for snapshot extraction, ICP scoring, and opener generation. Run via Clay’s native AI column or via the Anthropic Batch API from a Claude Code session for non-urgent batches at half the price.
HubSpot, Salesforce, or Attio — destination CRM. Map summary → custom property, icp_fit_score → custom property + view filter, opener_draft → first-touch sequence variable on a manual-approval step.

Edit this page on GitHub

Files in this artifact

Download all (.zip)

---
name: lead-enrichment
description: Enrich a Clay table row with company summary, ICP fit score, and a personalized cold-email opener. Use this skill from Clay AI columns or from a Claude Code session driving the Anthropic Batch API over an exported lead list. Returns structured fields plus a draft opener for rep review — never auto-sends.
---

# Lead enrichment

## When to invoke

Whenever a Clay table or exported CSV of leads needs three derived fields in one pass: a 2-sentence company summary, an ICP fit score with one-line reasoning, and a sub-50-word cold-email opener that cites a real signal from the company's public footprint. Take a domain (and optionally a LinkedIn URL or a free-text ICP rubric) as input and produce a structured JSON-shaped Markdown record per row.

Do NOT invoke this skill for:

- Auto-sending email. The opener is a draft for rep review. Wiring the output directly into a sequence-send step bypasses the guardrail this skill exists to enforce.
- Enriching leads that have not consented to processing. If the source list was scraped from a region with prior-consent rules (EU/UK GDPR, Quebec Law 25, etc.) without a documented lawful basis, stop and surface the concern to the operator. This skill will not silently process unconsented data.
- Account-level research briefs for discovery calls — use the `account-research` skill instead. This one optimizes for batch volume and cost-per-row, not depth.
- Public-company financial scoring or any decision that needs licensed data (S&P, Pitchbook). This skill reads public web only.

## Inputs

- Required: `domain` — the company's primary domain (e.g. `acme.com`). Must resolve and serve HTML; parked or dead domains are returned with `status: unreachable` and skipped.
- Required at config time: a Clay table reference (table ID + column map) OR a CSV path if running outside Clay. The skill assumes the table has at least `domain`; `first_name`, `title`, and `linkedin_url` improve opener quality if present.
- Required at config time: `references/icp-rubric.md` populated with your real rubric. Without it, `icp_fit_score` is omitted (not guessed).
- Optional: `references/opener-style-guide.md` — your team's voice, banned phrases, max length override.
- Optional: `references/source-quality-matrix.md` — vendor preference order when multiple enrichment columns upstream of this skill disagree.
- Optional: `recent_news` — pre-fetched by an upstream Clay column. When present, the opener step uses it instead of re-fetching.

## Reference files

Always read the following from `references/` before generating any row. They encode the operator's positioning and quality bar — without them output is generic.

- `references/icp-rubric.md` — the rubric the score uses. Replace template content with your actual ICP definition before first run.
- `references/opener-style-guide.md` — voice, length cap, banned phrases (e.g. "I noticed", "love what you're doing", any superlative).
- `references/source-quality-matrix.md` — preference order for resolving conflicts between upstream enrichment vendors (Apollo vs. Clearbit vs. ZoomInfo vs. Clay's native People/Company API).

## Method

Run these four sub-tasks in order per row. Steps 1-3 produce the structured fields; step 4 only runs when the ICP score clears the configured threshold.

### 1. Resolve and fetch

Hit `https://{domain}` with a 10-second timeout and one redirect hop. On non-2xx, parked-domain heuristics, or empty body, mark `status: unreachable` and skip steps 2-4. Do not retry indefinitely — parked domains are a meaningful share of any scraped list and the skill should fail fast on them rather than burn API spend.

If the homepage resolves, additionally fetch `/about`, `/company`, and `/customers` (best-effort, ignore 404s). Concatenate cleaned text up to ~8K tokens; truncate from the end of the longest section, not the front, to preserve the homepage hero copy.

### 2. Extract structured snapshot

In one Claude call, extract:

- `industry` — single noun phrase, derived from copy not guessed
- `size_signal` — one of `solo | smb | mid | ent`, justified by a cited copy fragment (employee count claim, customer-logo density, funding mention) or `unknown`
- `value_prop` — one sentence in the company's own framing
- `recent_signal` — at most one verifiable fact (a launch, hire, funding round, customer win) with a URL anchor. If none found in the fetched pages, return `null` rather than inventing one.

Engineering choice: extraction is a single structured Claude call, not three. Each additional round-trip multiplies cost-per-row at 100K-row scale; the extraction prompt is small enough to stay reliable in one pass.

### 3. Score against ICP rubric

Load `references/icp-rubric.md`. Score the snapshot 1-10 with a one-line justification that names which rubric dimension drove the score. If the rubric file still contains template placeholders (`{...}` literals), return `score: null` and `reason: "rubric not configured"` rather than scoring against nothing.

Engineering choice: ICP scoring runs before opener generation, not after. Rationale: opener generation is the most expensive step (longer output, more careful prompt). Skipping it for sub-threshold rows is where the cost savings live. A common misconfiguration is to generate openers for every row "in case the rep wants to override" — that defeats the filter.

### 4. Generate opener (only if score >= threshold)

Default threshold: 6/10. Configurable per run.

Inputs to the opener step: `value_prop`, `recent_signal` (if non-null), the lead's `first_name` and `title` if present, and the style guide.

Hard rules baked into the opener prompt:

- Max 50 words.
- Must reference exactly one specific thing from `recent_signal` or `value_prop`. If both are null, return `opener: null` rather than writing flattery.
- No superlatives ("amazing", "incredible", "love").
- No invented company claims. The opener may only reference facts present in the snapshot. The prompt includes the snapshot inline and explicitly forbids facts not in it.
- No question close that pretends to know an internal pain ("how are you handling X?" without evidence X is happening).

The opener is always returned as a draft with a `confidence` field (0-1) reflecting how strong the cited signal was. Reps see the confidence; low-confidence openers are surfaced for rewrite, not auto-sent.

## Output format

One record per row, emitted as a fenced JSON block inside Markdown so Clay's AI column can parse it and a human reading the log can scan it.

```markdown
### acme.com

```json
{ "domain": "acme.com", "status": "ok", "industry": "Workforce management software", "size_signal": "mid", "size_signal_evidence": "About page cites '350+ employees across 4 offices'", "value_prop": "Schedules and pays hourly workforces for restaurants and retail.", "recent_signal": { "summary": "Launched a payroll module in March 2026.", "url": "https://acme.com/blog/payroll-launch" }, "icp_fit_score": 8, "icp_fit_reason": "Mid-market, hourly-workforce vertical — direct match on rubric line 2.", "opener_draft": "Saw the March payroll launch — folding pay into the same surface as scheduling is the move most ops leaders ask us about. Worth a 20-min on whether the multi-state tax piece is hitting the same wall others have?", "opener_confidence": 0.78, "sources": [ "https://acme.com/", "https://acme.com/about", "https://acme.com/blog/payroll-launch" ] }
```
```

For unreachable rows the record collapses to `{ "domain": "...", "status": "unreachable" }` with no other fields — no placeholder strings.

## Watch-outs

- **Source-quality drift across vendors.** When a Clay table has both Apollo and Clearbit enrichment columns upstream and they disagree on headcount or industry, this skill's snapshot can flip-flop run to run. Guard: `references/source-quality-matrix.md` declares a preference order; the snapshot step cites which vendor (or homepage-derived value) it used per field, so drift is auditable.

- **Opener inventing claims that aren't in the data.** Without strict prompting, openers drift toward confident-sounding fabrications ("congrats on your Series C" when there was no Series C). Guard: the opener prompt receives the snapshot inline and has an explicit "facts not in the snapshot are forbidden" rule; `recent_signal` carries a URL so a reviewer can verify in one click; openers with `opener_confidence` under 0.5 are flagged for rewrite, not sent.

- **Cost-per-row escalation if the ICP filter is loose.** A rubric that scores most rows 7+ defeats the threshold gate; opener generation runs on every row and per-row cost rises 3-4x. Guard: the skill logs a `score_distribution` summary at the end of each batch (`{ "1-3": N, "4-6": N, "7-10": N }`); if more than 60% of rows land 7+ across a 1K-row sample, the skill prints a warning to tighten the rubric before the next batch.

- **Rate limits on homepage fetches.** Hitting one root domain per row is fine; hitting `/about` and `/customers` triples the request count and some hosts throttle. Guard: per-host concurrency capped at 2, per-host minimum delay 250ms, and 429s are honored with a single back-off retry before the row is marked `status: unreachable`.

- **Stale enrichment.** A row enriched 90 days ago is not the same signal as one enriched today; reps treating old `recent_signal` values as fresh write embarrassing openers. Guard: every record carries an `enriched_at` timestamp, and the Clay column is configured to re-run when `enriched_at` is older than 30 days.

# ICP rubric — TEMPLATE

> Replace every `{...}` placeholder with your team's real values before
> the first run. The lead-enrichment skill checks for unsubstituted
> placeholders and refuses to score against a template — it returns
> `score: null, reason: "rubric not configured"` instead of guessing.

## Firmographics

| Dimension | In-ICP | Stretch | Out |
|---|---|---|---|
| Industry | {list, e.g. "B2B SaaS, fintech, vertical SaaS"} | {adjacent industries} | {hard out, e.g. "consumer, agencies, edu"} |
| Headcount | {range, e.g. "200-2000"} | {range, e.g. "100-200, 2000-5000"} | {below/above, e.g. "<100, >5000"} |
| Revenue | {range, e.g. "$25M-$500M ARR"} | {range} | {below/above} |
| Geo | {regions, e.g. "US, Canada, UK"} | {regions, e.g. "EU, ANZ"} | {regions, e.g. "China, Russia, sanctioned"} |
| Funding stage | {stages, e.g. "Series B-D, public"} | {stages} | {stages, e.g. "pre-seed, bootstrapped <$5M"} |

## Technographics

Tools whose presence on the prospect's stack signals fit (we win when they have these because the integration story is short or the pain is acute):

- {Tool 1 — e.g. "Salesforce + Outreach"}
- {Tool 2 — e.g. "dbt + Snowflake"}
- {Tool 3 — e.g. "Segment"}

Tools whose presence signals misfit (we lose to them, or their presence implies a build-not-buy culture):

- {Tool A — e.g. "Hightouch (we lose to)"}
- {Tool B — e.g. "in-house reverse-ETL (build-not-buy)"}

## Pain ranking

When the skill produces an ICP score, it weights against this priority order. Higher = more important to surface first.

1. {Pain category — e.g. "outbound data quality blocking pipeline"} — weight 5
2. {Pain category — e.g. "rep ramp time eating quota"} — weight 4
3. {Pain category — e.g. "RevOps reporting backlog"} — weight 3
4. {Pain category — e.g. "tooling sprawl / consolidation push"} — weight 2
5. {Pain category — e.g. "compliance / data-residency"} — weight 1

## Score interpretation

The skill returns `1-10`. Suggested interpretation (override per team):

- **9-10** — direct ICP, multiple positive signals, no disqualifiers. Send.
- **7-8** — strong fit on firmographics, signal is plausible. Send after rep glance.
- **5-6** — adjacent. Default threshold cuts here. Opener may not generate.
- **3-4** — stretch fit, weak signal. Skip unless rep has specific reason.
- **1-2** — out of ICP. Suppress.

## Disqualifiers

Single signals that drop a row out regardless of other fit. The skill flags these with `disqualifier: "..."` in the output and forces score to 1.

- {Disqualifier 1 — e.g. "uses {Competitor} as system of record"}
- {Disqualifier 2 — e.g. "regulated industry without our SOC 2 + HIPAA"}
- {Disqualifier 3 — e.g. "headcount under 50 — wrong motion"}

## Last edited

{YYYY-MM-DD} — bump on every material change. The skill prepends a "rubric may be stale" note when this date is more than 90 days old.

# Opener style guide — TEMPLATE

> Replace the placeholders with your team's actual voice rules. The
> lead-enrichment skill loads this file on every opener generation.
> Treat it as the single source of truth — do not duplicate rules into
> the prompt itself, where they will drift.

## Voice

One sentence on register: {e.g. "direct, plain, no jargon, no exclamation marks, no questions in the close unless the question is specific"}.

One sentence on stance: {e.g. "we're peers solving the same problem, not vendors selling at"}.

## Length

- Hard cap: 50 words. The skill enforces this and truncates with an ellipsis warning if the model overshoots.
- Soft target: 30-40 words. Three sentences max.
- No greeting line ("Hi {name},"). The sequence step adds the greeting; baking one into the opener double-greets.

## What every opener must contain

1. A specific reference to one thing from the snapshot — preferably `recent_signal`. If there is no recent signal, the opener references the `value_prop` and labels itself low-confidence.
2. A reason the reference matters to the operator's persona. Not "I noticed X" — "X is the move ops leaders ask us about".
3. A single, specific close. Either a 20-min ask or a question that would be answerable only by someone inside the company.

## Banned phrases

The prompt rejects any opener containing these. Add team-specific additions over time.

- "I hope this email finds you well"
- "I noticed"
- "I came across your"
- "love what you're doing"
- "quick question"
- "circling back" (in a first touch)
- Any superlative — "amazing", "incredible", "best-in-class", "leading"
- Any em-dash trio that reads as an LLM tic — "X — Y — Z"
- Compliments on growth/funding without a cited source

## Banned structures

- The "I was just on your website" opener. Always reads false.
- The "we work with companies like {logo}" name-drop in sentence one.
- The "are you the right person?" close. Wastes the reply.

## Confidence signal

The skill returns `opener_confidence: 0.0-1.0`:

- **0.8+** — opener references a dated `recent_signal` with a URL and the rep's persona is in `references/1-icp-rubric.md` buying committee.
- **0.5-0.8** — opener references `value_prop` only, or `recent_signal` is older than 60 days.
- **<0.5** — opener is generic; rep should rewrite or skip the row.

Threshold for auto-queueing into a sequence: configure per-team. Default recommendation: **never auto-queue**. The opener is a draft.

## Worked example

Snapshot: industry "scheduling for restaurants", recent_signal "March payroll launch", value_prop "schedules and pays hourly workforces".

Opener (35 words, confidence 0.78):

> Saw the March payroll launch — folding pay into the same surface as
> scheduling is the move most ops leaders ask us about. Worth a 20-min
> on whether the multi-state tax piece is hitting the same wall others
> have?

Why it scores 0.78 not 1.0: the close assumes a pain ("multi-state tax piece is hitting the same wall") that the snapshot does not directly evidence. A 1.0 opener would tie the close to a fact in the snapshot.

## Last edited

{YYYY-MM-DD}

# Source quality matrix — TEMPLATE

> Replace placeholders with the vendors actually wired into your Clay
> table. The lead-enrichment skill consults this matrix when two
> upstream sources disagree on the same field, and cites which source
> won in the per-row output for audit.

## Why this exists

Clay tables routinely stack 2-3 enrichment vendors per field — Apollo on industry, Clearbit on headcount, ZoomInfo on funding. They disagree often: Apollo will say a 200-person company is 50, Clearbit will assign the wrong industry, ZoomInfo will lag a recent acquisition by a quarter.

Without a declared preference order, the skill's snapshot step flip-flops between vendors run to run, ICP scores oscillate, and operators cannot reproduce the same row twice.

## Field-by-field preference order

Per field, list vendors top-to-bottom in trust order. The skill takes the highest-ranked vendor that returned a non-empty value. The homepage-derived value (extracted by the skill itself in step 1) is always the tiebreaker if all vendors disagree with each other.

### Industry / sector

1. {e.g. "Homepage About-page extraction (skill step 1)"}
2. {e.g. "Clearbit Reveal"}
3. {e.g. "Apollo Company"}
4. {e.g. "ZoomInfo Industry"}

### Headcount

1. {e.g. "LinkedIn employee count via Clay's Linked-In Profile column"}
2. {e.g. "Apollo employee_count"}
3. {e.g. "Clearbit metrics.employees"}
4. Skill homepage extraction (last resort — copy claims often inflated)

### Revenue / ARR

1. {e.g. "ZoomInfo revenue (paid only)"}
2. {e.g. "Clearbit metrics.annualRevenue (estimated, low confidence)"}
3. Omit (do not extract from homepage; vanity metrics are not revenue)

### Funding stage / last round

1. {e.g. "Crunchbase via Clay's native column"}
2. {e.g. "Press-release fetch in skill step 1, only if dated within 90d"}
3. Omit

### Recent signal (launches, hires, news)

1. Skill homepage / blog fetch in step 1 — must carry a dated URL
2. {e.g. "Clay's Built-With column for tech-stack changes"}
3. {e.g. "Google News column, capped to last 60 days"}

Important: a vendor returning "we don't know" is treated as missing, not as a contradicting value. The next vendor in the list is consulted.

## Conflict logging

When two vendors return different values for the same field, the skill adds a `conflicts` block to the per-row output:

```json
"conflicts": [
  { "field": "headcount", "values": { "linkedin": 380, "apollo": 120 }, "chose": "linkedin" }
]
```

Operators should sample 10-20 conflict rows weekly. If one vendor is consistently wrong, demote it in this matrix. The matrix is the control surface, not the prompt.

## Vendor cost ceiling

Each vendor has a per-row cost (Clay credits or external API spend). Set a soft cap here — if a row's combined enrichment cost exceeds the cap, skip the lowest-ranked vendor and accept lower confidence.

- Soft cap per row: {e.g. "12 Clay credits"}
- Hard cap per row: {e.g. "20 Clay credits — abort row"}

The skill respects these caps and emits `cost_ceiling_hit: true` when the hard cap fires, so the batch summary surfaces the count.

## Last edited

{YYYY-MM-DD}