claude-skill

Auto-enriquecer leads con Clay + Claude

Dificultad

intermedio

Tiempo de setup

30min

Para

revops · sdr-leader

RevOps

Stack

Un Claude Skill (lead-enrichment) que toma una fila de tabla Clay con clave en domain y devuelve tres campos derivados en una sola pasada estructurada: un resumen de empresa de 2 frases, un ICP fit score 1-10 con una razón de una línea, y un opener de cold-email de menos de 50 palabras que cita una señal específica del footprint público de la empresa. El opener siempre es un draft para revisión del rep — el skill se niega a ser cableado a un paso de auto-send.

El bundle del artefacto se entrega en apps/web/public/artifacts/lead-enrichment-clay-claude/ y contiene el cuerpo del skill (SKILL.md) más tres plantillas de referencia que el operador rellena una vez y el skill carga en cada corrida.

Cuándo usarlo

Tienes una tabla Clay (o un CSV de leads exportable a Clay) con al menos una columna domain poblada, y necesitas pushear registros enriquecidos a HubSpot, Salesforce, Attio o un sequencer.
El volumen está en cientos a decenas de miles por batch. Por debajo de eso, escribir openers a mano es más rápido. Por encima, la disciplina de costo en este skill (generación de opener gateada por ICP, extracción en una sola call, caps de fetch por host) es lo que mantiene el gasto por-fila razonable.
Los reps están dispuestos a leer el draft del opener antes de que se envíe. Todo el diseño asume un ojo humano entre draft y send, y se degrada feo sin él.
Quieres un único skill que el equipo pueda llamar tanto desde columnas AI de Clay como desde una sesión de Claude Code corriendo la Anthropic Batch API sobre una lista exportada — el cuerpo del skill es idéntico en ambas superficies.

Cuándo NO usarlo

Sequencers de auto-send. Si el plan es “cablear opener_draft directo al primer paso de la secuencia”, detente. El opener carga un campo opener_confidence por una razón: aproximadamente uno de cada cinco drafts necesita reescritura. Auto-enviarlos es el modo de falla contra el que el skill está diseñado.
Listas sin una base legal documentada para procesar. Contactos scrapeados de UE o UK sin consentimiento previo, leads de Quebec bajo Law 25, datos de consumidor de California sin un pathway CCPA — el skill enriquecerá lo que sea que apuntes, y precisamente por eso el operador debe verificar primero. La página MDX entrega un bloque “Do NOT invoke” en SKILL.md cubriendo esto; no lo comentes.
Briefs de discovery a nivel de cuenta. Usa el skill account-research en su lugar. Ese hace mapeo profundo de personas e hipótesis de pain; este optimiza por volumen de batch y costo-por-fila.
Decisiones que requieren datos financieros licenciados (S&P, Pitchbook). Este skill solo lee la web pública y no fabricará cifras de revenue desde el copy del homepage.
Listas donde la fuente de datos es basura. Garbage in, garbage out. Corre la columna de validación de dominio de Clay aguas arriba; los dominios parqueados y los 404 son saltados por el skill pero igual estás pagando créditos para enterarte.

Setup

Mete el skill en tu entorno. Para Claude.ai, importa el SKILL.md desde apps/web/public/artifacts/lead-enrichment-clay-claude/SKILL.md y sube los tres archivos en references/. Para Claude Code, copia el directorio a ~/.claude/skills/lead-enrichment/.
Reemplaza cada placeholder {...} en references/1-icp-rubric.md con el ICP real de tu equipo. El skill detecta placeholders no sustituidos y se niega a puntuar contra una plantilla — score: null, reason: "rubric not configured" en lugar de adivinar. Esto es intencional; una rúbrica equivocada es peor que ninguna rúbrica.
Edita references/2-opener-style-guide.md con la voz de tu equipo y las frases prohibidas. Los defaults prohíben los tells obvios de LLM (“I noticed”, “love what you’re doing”, cualquier superlativo); añade prohibiciones específicas de la empresa a medida que tus reps las marquen.
Edita references/3-source-quality-matrix.md para declarar el orden de preferencia entre los vendors de enrichment cableados aguas arriba de este skill en tu tabla Clay. Sin un orden declarado, el paso de snapshot oscila entre Apollo y Clearbit corrida tras corrida, y los ICP scores driftan.
En Clay, crea tres columnas AI referenciando el skill: summary, icp_fit_score, opener_draft. Mapea los inputs según la sección “Inputs” de SKILL.md. Configura el push de destino a HubSpot (o tu CRM) y rutea opener_draft a una variable de secuencia en un paso que requiera aprobación manual, no auto-send.
Corre sobre una muestra de 20 filas. Spot-check: ¿los hechos del snapshot trazan a copy real del homepage, los ICP scores aterrizan en una distribución sensata, los openers pasan el style guide? Afina la rúbrica y el style guide antes de escalar. Las primeras 100 filas son data de calibración; trátalas como tal.

Qué hace realmente el skill

Por fila, cuatro sub-tareas en orden:

Resolver y fetch. https://{domain} con timeout de 10 segundos y un hop de redirect, luego best-effort /about, /company, /customers. Los dominios parqueados / 404 / con body vacío devuelven status: unreachable y saltan el resto. El cap de concurrencia por-host es 2, con un delay mínimo de 250ms y un único back-off retry en 429.
Extraer snapshot estructurado en una sola call de Claude: industry, size_signal, value_prop, recent_signal opcional con URL. Una sola call en lugar de tres porque el round-trip count es el driver dominante de costo-por-fila a escala y el prompt de extracción se mantiene confiable en una pasada.
Puntuar contra la rúbrica ICP. Carga references/1-icp-rubric.md, devuelve 1-10 con una razón de una línea que nombra la dimensión de rúbrica que llevó al score.
Generar opener — solo si el score pasa el threshold (default 6/10). Reglas duras en el prompt: cap de 50 palabras, referencia exactamente un hecho del snapshot, sin superlativos, sin claims inventados de la empresa, sin cierre con pregunta de pain falso. Devuelve opener_confidence 0.0-1.0; bajo 0.5 se marca para reescritura.

El output de una fila es un bloque JSON embebido en Markdown — la columna AI de Clay lo parsea, y un humano leyendo el log de la corrida puede escanearlo. El esquema completo y un ejemplo trabajado están en la sección “Output format” de apps/web/public/artifacts/lead-enrichment-clay-claude/SKILL.md.

Realidad de costos

Hay dos líneas de costo: tokens de Anthropic y créditos de Clay.

Tokens de Anthropic a precio Sonnet (input $3/MTok, output $15/MTok a la fecha de redacción):

Pasos 1+2 (fetch + extract): ~3.5K tokens de input (homepage + about truncado a ~8K char) + ~250 tokens de output. Aproximadamente $0.014/fila.
Paso 3 (ICP score): ~1K input + ~80 output. Aproximadamente $0.004/fila.
Paso 4 (opener, solo cuando el score >= threshold): ~1.2K input + ~120 output. Aproximadamente $0.005/fila.

Así que una fila que pasa el threshold aterriza alrededor de $0.023; una fila que no, alrededor de $0.018. Corre vía la Anthropic Batch API para ~50% off cuando la carga no es urgente (enrichment overnight de MQLs inbound es el textbook fit) — eso baja las filas al rango $0.01-0.012.

A escala: un batch semanal de 100K filas con ~40% pasando threshold es ~$1,500-2,000/mes en Anthropic antes del descuento de batch, ~$800-1,000 después.

Créditos de Clay dependen de qué columnas de vendor están cableadas aguas arriba. Apollo cuesta alrededor de 1 crédito por dominio resuelto; Clearbit Reveal es 2-3; ZoomInfo (passthrough pagado) es más. Stackea tres vendors y una sola fila puede pegar 8-12 créditos de Clay antes de que el skill mismo corra. El plan Starter entrega 5K créditos/mes; el plan Pro 25K. Un batch de 100K filas bajo ese stack de vendors necesita el tier Enterprise o una matriz más apretada en references/3-source-quality-matrix.md. La matriz existe específicamente para botar al vendor peor-rankeado cuando se dispara el techo de costo por-fila.

Si la matemática se siente apretada, el punto de palanca es el threshold del ICP. Subirlo de 6 a 7 típicamente suprime la generación de opener en 25-35% más filas; subirlo a 8 bota otro 20%. El skill loggea la distribución del score al final de cada batch para que el operador pueda afinar empíricamente en lugar de por corazonada.

Métrica de éxito

Reply rate sobre los openers revisados por el rep, segmentado por bucket de opener_confidence. El punto del skill no es “más openers por hora” — es “openers lo bastante buenos para que los reps dejen de reescribirlos desde cero”. Dos sub-métricas que vale la pena instrumentar:

Rewrite rate — qué fracción de valores opener_draft envía el rep sin cambios vs. reescribe materialmente. Target: bajo 35% en filas con confidence 0.7+ después de las primeras 500 filas de calibración. Más alto significa que el style guide está mal, la rúbrica está mal, o el paso de snapshot está alucinando.
Reply rate por bucket de confidence. El reply rate en opener_confidence >= 0.7 debería ser al menos 1.5x el reply rate del bucket bajo 0.5. Si son similares, el confidence score no es señal — investiga antes de confiar en él como input de routing.

vs alternativas

vs. la personalización de secuencia nativa de Apollo. Apollo generará openers desde su propia data de enrichment. Es más rápido de levantar pero los openers son visiblemente plantillados, puntuados con la heurística ICP de Apollo (no la tuya), y no tienen audit trail de qué hecho llevó al draft. Este skill toma más tiempo en setup y cuesta más por fila, pero los openers referencian URLs fechadas que puedes verificar en un click y la rúbrica es un archivo que tú controlas.
vs. Clearbit + Outreach Smart Variables. Las Smart Variables alimentadas por Clearbit producen mail-merge factual ("their industry is ${X}"), no openers — necesitan un humano que escriba la frase real alrededor de la variable. Más barato que este skill si tus reps ya están escribiendo las frases; más caro en total si no, porque el tiempo de rep domina el costo de tokens.
vs. escritura manual de openers. Un SDR senior escribe un cold opener de alta calidad en 4-7 minutos por cuenta a ~$60/hora fully loaded — llámale $5 de tiempo de rep por opener. El skill es como mucho ~$0.025 por opener. El gancho: el rep también hizo algo de pensamiento a nivel de cuenta mientras escribía. El skill no. La forma correcta para la mayoría de equipos es el skill sobre volumen top-of-funnel (todo bajo la lista de cuentas tier-1) y openers escritos por reps en la lista de named-account.
vs. status quo (sin enrichment, openers genéricos). Los reply rates en openers genéricos corren en algún lugar del rango 1-2%; los openers ligeramente personalizados atados a una señal reciente corren 4-8% en benchmarks publicados. El skill apunta al segundo. Vale la pena solo si el equipo está dispuesto a levantar la rúbrica y el style guide; sin eso, los outputs del skill no son materialmente mejores que el status quo.

A vigilar

Drift de calidad de fuente entre vendors. Cuando Apollo, Clearbit y ZoomInfo enriquecen la misma fila y discrepan en headcount o industry, el paso de snapshot oscila entre ellos corrida tras corrida. Guard: references/3-source-quality-matrix.md declara el orden de preferencia; el snapshot cita qué vendor (o valor de homepage) usó por campo, así el drift es auditable en el log conflicts por-fila.
Opener inventando claims que no están en la data. Sin prompting estricto, los openers fabrican hechos confiados (“congrats on the Series C” sin Series C). Guard: el prompt del opener recibe el snapshot inline con una regla explícita “los hechos no en el snapshot están prohibidos”; recent_signal carga una URL para verificación de un click; los openers bajo opener_confidence 0.5 se marcan para reescritura, nunca auto-enviados.
Escalada de costo-por-fila cuando el filtro ICP es laxo. Una rúbrica que puntúa la mayoría de las filas 7+ derrota la compuerta de threshold; la generación de opener corre en cada fila y el costo por-fila sube 3-4x. Guard: el skill emite un resumen score_distribution por batch; si más del 60% de una muestra de 1K filas aterriza 7+, el skill imprime una advertencia y recomienda apretar la rúbrica antes del próximo batch.
recent_signal stale. Una señal extraída hace 90 días se vuelve un pasivo — reps escribiendo “saw your March launch” en agosto se leen como dormidos al volante. Guard: cada record carga enriched_at; la columna Clay está configurada para re-correr cuando es más vieja que 30 días; el paso de opener se niega a usar un recent_signal cuya URL date está más de 60 días detrás de enriched_at.
Consentimiento y base legal. El skill enriquece lo que sea que apuntes. El bloque “Do NOT invoke” en SKILL.md existe para recordarle al operador chequear la base legal de la lista fuente antes de correr. No lo borres.

Stack

Clay — sustrato de tabla, stack de vendors de enrichment aguas arriba, push al CRM destino. El plan Starter soporta el primitive de columna AI al que el skill se enchufa; Pro es requerido para el volumen de créditos de cualquier batch serio.
Claude (Sonnet por default) — la capa de inferencia para extracción de snapshot, ICP scoring y generación de opener. Corre vía la columna AI nativa de Clay o vía la Anthropic Batch API desde una sesión de Claude Code para batches no urgentes a la mitad del precio.
HubSpot, Salesforce o Attio — CRM destino. Mapea summary → propiedad custom, icp_fit_score → propiedad custom + filtro de vista, opener_draft → variable de secuencia first-touch en un paso de aprobación manual.

Editar esta página en GitHub

Archivos de este artefacto

Descargar todo (.zip)

---
name: lead-enrichment
description: Enrich a Clay table row with company summary, ICP fit score, and a personalized cold-email opener. Use this skill from Clay AI columns or from a Claude Code session driving the Anthropic Batch API over an exported lead list. Returns structured fields plus a draft opener for rep review — never auto-sends.
---

# Lead enrichment

## When to invoke

Whenever a Clay table or exported CSV of leads needs three derived fields in one pass: a 2-sentence company summary, an ICP fit score with one-line reasoning, and a sub-50-word cold-email opener that cites a real signal from the company's public footprint. Take a domain (and optionally a LinkedIn URL or a free-text ICP rubric) as input and produce a structured JSON-shaped Markdown record per row.

Do NOT invoke this skill for:

- Auto-sending email. The opener is a draft for rep review. Wiring the output directly into a sequence-send step bypasses the guardrail this skill exists to enforce.
- Enriching leads that have not consented to processing. If the source list was scraped from a region with prior-consent rules (EU/UK GDPR, Quebec Law 25, etc.) without a documented lawful basis, stop and surface the concern to the operator. This skill will not silently process unconsented data.
- Account-level research briefs for discovery calls — use the `account-research` skill instead. This one optimizes for batch volume and cost-per-row, not depth.
- Public-company financial scoring or any decision that needs licensed data (S&P, Pitchbook). This skill reads public web only.

## Inputs

- Required: `domain` — the company's primary domain (e.g. `acme.com`). Must resolve and serve HTML; parked or dead domains are returned with `status: unreachable` and skipped.
- Required at config time: a Clay table reference (table ID + column map) OR a CSV path if running outside Clay. The skill assumes the table has at least `domain`; `first_name`, `title`, and `linkedin_url` improve opener quality if present.
- Required at config time: `references/icp-rubric.md` populated with your real rubric. Without it, `icp_fit_score` is omitted (not guessed).
- Optional: `references/opener-style-guide.md` — your team's voice, banned phrases, max length override.
- Optional: `references/source-quality-matrix.md` — vendor preference order when multiple enrichment columns upstream of this skill disagree.
- Optional: `recent_news` — pre-fetched by an upstream Clay column. When present, the opener step uses it instead of re-fetching.

## Reference files

Always read the following from `references/` before generating any row. They encode the operator's positioning and quality bar — without them output is generic.

- `references/icp-rubric.md` — the rubric the score uses. Replace template content with your actual ICP definition before first run.
- `references/opener-style-guide.md` — voice, length cap, banned phrases (e.g. "I noticed", "love what you're doing", any superlative).
- `references/source-quality-matrix.md` — preference order for resolving conflicts between upstream enrichment vendors (Apollo vs. Clearbit vs. ZoomInfo vs. Clay's native People/Company API).

## Method

Run these four sub-tasks in order per row. Steps 1-3 produce the structured fields; step 4 only runs when the ICP score clears the configured threshold.

### 1. Resolve and fetch

Hit `https://{domain}` with a 10-second timeout and one redirect hop. On non-2xx, parked-domain heuristics, or empty body, mark `status: unreachable` and skip steps 2-4. Do not retry indefinitely — parked domains are a meaningful share of any scraped list and the skill should fail fast on them rather than burn API spend.

If the homepage resolves, additionally fetch `/about`, `/company`, and `/customers` (best-effort, ignore 404s). Concatenate cleaned text up to ~8K tokens; truncate from the end of the longest section, not the front, to preserve the homepage hero copy.

### 2. Extract structured snapshot

In one Claude call, extract:

- `industry` — single noun phrase, derived from copy not guessed
- `size_signal` — one of `solo | smb | mid | ent`, justified by a cited copy fragment (employee count claim, customer-logo density, funding mention) or `unknown`
- `value_prop` — one sentence in the company's own framing
- `recent_signal` — at most one verifiable fact (a launch, hire, funding round, customer win) with a URL anchor. If none found in the fetched pages, return `null` rather than inventing one.

Engineering choice: extraction is a single structured Claude call, not three. Each additional round-trip multiplies cost-per-row at 100K-row scale; the extraction prompt is small enough to stay reliable in one pass.

### 3. Score against ICP rubric

Load `references/icp-rubric.md`. Score the snapshot 1-10 with a one-line justification that names which rubric dimension drove the score. If the rubric file still contains template placeholders (`{...}` literals), return `score: null` and `reason: "rubric not configured"` rather than scoring against nothing.

Engineering choice: ICP scoring runs before opener generation, not after. Rationale: opener generation is the most expensive step (longer output, more careful prompt). Skipping it for sub-threshold rows is where the cost savings live. A common misconfiguration is to generate openers for every row "in case the rep wants to override" — that defeats the filter.

### 4. Generate opener (only if score >= threshold)

Default threshold: 6/10. Configurable per run.

Inputs to the opener step: `value_prop`, `recent_signal` (if non-null), the lead's `first_name` and `title` if present, and the style guide.

Hard rules baked into the opener prompt:

- Max 50 words.
- Must reference exactly one specific thing from `recent_signal` or `value_prop`. If both are null, return `opener: null` rather than writing flattery.
- No superlatives ("amazing", "incredible", "love").
- No invented company claims. The opener may only reference facts present in the snapshot. The prompt includes the snapshot inline and explicitly forbids facts not in it.
- No question close that pretends to know an internal pain ("how are you handling X?" without evidence X is happening).

The opener is always returned as a draft with a `confidence` field (0-1) reflecting how strong the cited signal was. Reps see the confidence; low-confidence openers are surfaced for rewrite, not auto-sent.

## Output format

One record per row, emitted as a fenced JSON block inside Markdown so Clay's AI column can parse it and a human reading the log can scan it.

```markdown
### acme.com

```json
{ "domain": "acme.com", "status": "ok", "industry": "Workforce management software", "size_signal": "mid", "size_signal_evidence": "About page cites '350+ employees across 4 offices'", "value_prop": "Schedules and pays hourly workforces for restaurants and retail.", "recent_signal": { "summary": "Launched a payroll module in March 2026.", "url": "https://acme.com/blog/payroll-launch" }, "icp_fit_score": 8, "icp_fit_reason": "Mid-market, hourly-workforce vertical — direct match on rubric line 2.", "opener_draft": "Saw the March payroll launch — folding pay into the same surface as scheduling is the move most ops leaders ask us about. Worth a 20-min on whether the multi-state tax piece is hitting the same wall others have?", "opener_confidence": 0.78, "sources": [ "https://acme.com/", "https://acme.com/about", "https://acme.com/blog/payroll-launch" ] }
```
```

For unreachable rows the record collapses to `{ "domain": "...", "status": "unreachable" }` with no other fields — no placeholder strings.

## Watch-outs

- **Source-quality drift across vendors.** When a Clay table has both Apollo and Clearbit enrichment columns upstream and they disagree on headcount or industry, this skill's snapshot can flip-flop run to run. Guard: `references/source-quality-matrix.md` declares a preference order; the snapshot step cites which vendor (or homepage-derived value) it used per field, so drift is auditable.

- **Opener inventing claims that aren't in the data.** Without strict prompting, openers drift toward confident-sounding fabrications ("congrats on your Series C" when there was no Series C). Guard: the opener prompt receives the snapshot inline and has an explicit "facts not in the snapshot are forbidden" rule; `recent_signal` carries a URL so a reviewer can verify in one click; openers with `opener_confidence` under 0.5 are flagged for rewrite, not sent.

- **Cost-per-row escalation if the ICP filter is loose.** A rubric that scores most rows 7+ defeats the threshold gate; opener generation runs on every row and per-row cost rises 3-4x. Guard: the skill logs a `score_distribution` summary at the end of each batch (`{ "1-3": N, "4-6": N, "7-10": N }`); if more than 60% of rows land 7+ across a 1K-row sample, the skill prints a warning to tighten the rubric before the next batch.

- **Rate limits on homepage fetches.** Hitting one root domain per row is fine; hitting `/about` and `/customers` triples the request count and some hosts throttle. Guard: per-host concurrency capped at 2, per-host minimum delay 250ms, and 429s are honored with a single back-off retry before the row is marked `status: unreachable`.

- **Stale enrichment.** A row enriched 90 days ago is not the same signal as one enriched today; reps treating old `recent_signal` values as fresh write embarrassing openers. Guard: every record carries an `enriched_at` timestamp, and the Clay column is configured to re-run when `enriched_at` is older than 30 days.

# ICP rubric — TEMPLATE

> Replace every `{...}` placeholder with your team's real values before
> the first run. The lead-enrichment skill checks for unsubstituted
> placeholders and refuses to score against a template — it returns
> `score: null, reason: "rubric not configured"` instead of guessing.

## Firmographics

| Dimension | In-ICP | Stretch | Out |
|---|---|---|---|
| Industry | {list, e.g. "B2B SaaS, fintech, vertical SaaS"} | {adjacent industries} | {hard out, e.g. "consumer, agencies, edu"} |
| Headcount | {range, e.g. "200-2000"} | {range, e.g. "100-200, 2000-5000"} | {below/above, e.g. "<100, >5000"} |
| Revenue | {range, e.g. "$25M-$500M ARR"} | {range} | {below/above} |
| Geo | {regions, e.g. "US, Canada, UK"} | {regions, e.g. "EU, ANZ"} | {regions, e.g. "China, Russia, sanctioned"} |
| Funding stage | {stages, e.g. "Series B-D, public"} | {stages} | {stages, e.g. "pre-seed, bootstrapped <$5M"} |

## Technographics

Tools whose presence on the prospect's stack signals fit (we win when they have these because the integration story is short or the pain is acute):

- {Tool 1 — e.g. "Salesforce + Outreach"}
- {Tool 2 — e.g. "dbt + Snowflake"}
- {Tool 3 — e.g. "Segment"}

Tools whose presence signals misfit (we lose to them, or their presence implies a build-not-buy culture):

- {Tool A — e.g. "Hightouch (we lose to)"}
- {Tool B — e.g. "in-house reverse-ETL (build-not-buy)"}

## Pain ranking

When the skill produces an ICP score, it weights against this priority order. Higher = more important to surface first.

1. {Pain category — e.g. "outbound data quality blocking pipeline"} — weight 5
2. {Pain category — e.g. "rep ramp time eating quota"} — weight 4
3. {Pain category — e.g. "RevOps reporting backlog"} — weight 3
4. {Pain category — e.g. "tooling sprawl / consolidation push"} — weight 2
5. {Pain category — e.g. "compliance / data-residency"} — weight 1

## Score interpretation

The skill returns `1-10`. Suggested interpretation (override per team):

- **9-10** — direct ICP, multiple positive signals, no disqualifiers. Send.
- **7-8** — strong fit on firmographics, signal is plausible. Send after rep glance.
- **5-6** — adjacent. Default threshold cuts here. Opener may not generate.
- **3-4** — stretch fit, weak signal. Skip unless rep has specific reason.
- **1-2** — out of ICP. Suppress.

## Disqualifiers

Single signals that drop a row out regardless of other fit. The skill flags these with `disqualifier: "..."` in the output and forces score to 1.

- {Disqualifier 1 — e.g. "uses {Competitor} as system of record"}
- {Disqualifier 2 — e.g. "regulated industry without our SOC 2 + HIPAA"}
- {Disqualifier 3 — e.g. "headcount under 50 — wrong motion"}

## Last edited

{YYYY-MM-DD} — bump on every material change. The skill prepends a "rubric may be stale" note when this date is more than 90 days old.

# Opener style guide — TEMPLATE

> Replace the placeholders with your team's actual voice rules. The
> lead-enrichment skill loads this file on every opener generation.
> Treat it as the single source of truth — do not duplicate rules into
> the prompt itself, where they will drift.

## Voice

One sentence on register: {e.g. "direct, plain, no jargon, no exclamation marks, no questions in the close unless the question is specific"}.

One sentence on stance: {e.g. "we're peers solving the same problem, not vendors selling at"}.

## Length

- Hard cap: 50 words. The skill enforces this and truncates with an ellipsis warning if the model overshoots.
- Soft target: 30-40 words. Three sentences max.
- No greeting line ("Hi {name},"). The sequence step adds the greeting; baking one into the opener double-greets.

## What every opener must contain

1. A specific reference to one thing from the snapshot — preferably `recent_signal`. If there is no recent signal, the opener references the `value_prop` and labels itself low-confidence.
2. A reason the reference matters to the operator's persona. Not "I noticed X" — "X is the move ops leaders ask us about".
3. A single, specific close. Either a 20-min ask or a question that would be answerable only by someone inside the company.

## Banned phrases

The prompt rejects any opener containing these. Add team-specific additions over time.

- "I hope this email finds you well"
- "I noticed"
- "I came across your"
- "love what you're doing"
- "quick question"
- "circling back" (in a first touch)
- Any superlative — "amazing", "incredible", "best-in-class", "leading"
- Any em-dash trio that reads as an LLM tic — "X — Y — Z"
- Compliments on growth/funding without a cited source

## Banned structures

- The "I was just on your website" opener. Always reads false.
- The "we work with companies like {logo}" name-drop in sentence one.
- The "are you the right person?" close. Wastes the reply.

## Confidence signal

The skill returns `opener_confidence: 0.0-1.0`:

- **0.8+** — opener references a dated `recent_signal` with a URL and the rep's persona is in `references/1-icp-rubric.md` buying committee.
- **0.5-0.8** — opener references `value_prop` only, or `recent_signal` is older than 60 days.
- **<0.5** — opener is generic; rep should rewrite or skip the row.

Threshold for auto-queueing into a sequence: configure per-team. Default recommendation: **never auto-queue**. The opener is a draft.

## Worked example

Snapshot: industry "scheduling for restaurants", recent_signal "March payroll launch", value_prop "schedules and pays hourly workforces".

Opener (35 words, confidence 0.78):

> Saw the March payroll launch — folding pay into the same surface as
> scheduling is the move most ops leaders ask us about. Worth a 20-min
> on whether the multi-state tax piece is hitting the same wall others
> have?

Why it scores 0.78 not 1.0: the close assumes a pain ("multi-state tax piece is hitting the same wall") that the snapshot does not directly evidence. A 1.0 opener would tie the close to a fact in the snapshot.

## Last edited

{YYYY-MM-DD}

# Source quality matrix — TEMPLATE

> Replace placeholders with the vendors actually wired into your Clay
> table. The lead-enrichment skill consults this matrix when two
> upstream sources disagree on the same field, and cites which source
> won in the per-row output for audit.

## Why this exists

Clay tables routinely stack 2-3 enrichment vendors per field — Apollo on industry, Clearbit on headcount, ZoomInfo on funding. They disagree often: Apollo will say a 200-person company is 50, Clearbit will assign the wrong industry, ZoomInfo will lag a recent acquisition by a quarter.

Without a declared preference order, the skill's snapshot step flip-flops between vendors run to run, ICP scores oscillate, and operators cannot reproduce the same row twice.

## Field-by-field preference order

Per field, list vendors top-to-bottom in trust order. The skill takes the highest-ranked vendor that returned a non-empty value. The homepage-derived value (extracted by the skill itself in step 1) is always the tiebreaker if all vendors disagree with each other.

### Industry / sector

1. {e.g. "Homepage About-page extraction (skill step 1)"}
2. {e.g. "Clearbit Reveal"}
3. {e.g. "Apollo Company"}
4. {e.g. "ZoomInfo Industry"}

### Headcount

1. {e.g. "LinkedIn employee count via Clay's Linked-In Profile column"}
2. {e.g. "Apollo employee_count"}
3. {e.g. "Clearbit metrics.employees"}
4. Skill homepage extraction (last resort — copy claims often inflated)

### Revenue / ARR

1. {e.g. "ZoomInfo revenue (paid only)"}
2. {e.g. "Clearbit metrics.annualRevenue (estimated, low confidence)"}
3. Omit (do not extract from homepage; vanity metrics are not revenue)

### Funding stage / last round

1. {e.g. "Crunchbase via Clay's native column"}
2. {e.g. "Press-release fetch in skill step 1, only if dated within 90d"}
3. Omit

### Recent signal (launches, hires, news)

1. Skill homepage / blog fetch in step 1 — must carry a dated URL
2. {e.g. "Clay's Built-With column for tech-stack changes"}
3. {e.g. "Google News column, capped to last 60 days"}

Important: a vendor returning "we don't know" is treated as missing, not as a contradicting value. The next vendor in the list is consulted.

## Conflict logging

When two vendors return different values for the same field, the skill adds a `conflicts` block to the per-row output:

```json
"conflicts": [
  { "field": "headcount", "values": { "linkedin": 380, "apollo": 120 }, "chose": "linkedin" }
]
```

Operators should sample 10-20 conflict rows weekly. If one vendor is consistently wrong, demote it in this matrix. The matrix is the control surface, not the prompt.

## Vendor cost ceiling

Each vendor has a per-row cost (Clay credits or external API spend). Set a soft cap here — if a row's combined enrichment cost exceeds the cap, skip the lowest-ranked vendor and accept lower confidence.

- Soft cap per row: {e.g. "12 Clay credits"}
- Hard cap per row: {e.g. "20 Clay credits — abort row"}

The skill respects these caps and emits `cost_ceiling_hit: true` when the hard cap fires, so the batch summary surfaces the count.

## Last edited

{YYYY-MM-DD}