claude-skill

Score leads against an ICP rubric using Claude

Difficulty

intermediate

Setup time

30min

For

revops

RevOps

Stack

A Claude Skill that takes any lead row, runs it against your team’s ICP rubric, and returns a 0-10 score, a per-criterion rationale citing the rubric, a recommended next action by tier, and an escalation flag for borderline cases. Designed to plug into a Clay AI column, a HubSpot custom-code action, or a standalone CLI run over a CSV. Replaces the spreadsheet scoring matrix nobody has updated since last year — without pretending it can also do intent or behavioral scoring, which it cannot.

The bundle ships at apps/web/public/artifacts/lead-scoring-icp-rubric-skill/ and contains SKILL.md plus three reference templates the user adapts before first run.

When to use

Use this skill when you have inbound MQLs piling up faster than your SDR team can triage them, and the existing scoring is either nonexistent (“everything is a lead”) or stale (“HubSpot scoring matrix last calibrated in 2023, nobody trusts it”). It is also useful for outbound: score an enriched cold list before assigning it, and you stop burning SDR time on out-of-ICP companies that look superficially fine.

The skill is fit scoring, not intent scoring. It answers “is this the right kind of company for us” — not “are they in-market this week.” That distinction matters: if you only ever score for fit, you will sequence great-fit accounts that have no current need and ignore poor-fit accounts that are actively buying. Pair this skill with whatever signals in-market behavior — Bombora, 6sense, your own product-usage events, pricing-page hits — to route correctly.

Concretely, invoke it from:

A Clay AI column that fires on every new row in a lead table, writing the score and rationale back to two columns.
A HubSpot custom-code action in a workflow triggered by Lifecycle stage = MQL, which calls the skill and writes both the score and the rationale to lead properties.
A standalone CLI over a CSV export — useful for one-off list scoring before a campaign launch.

When NOT to use

Skip this skill when:

You want to auto-reject leads with no human in the loop. The output is a recommendation. The skill explicitly tags borderline cases with escalate: needs_human_review, but if you wire it to delete leads scored C or below, you will silently destroy pipeline whenever the rubric drifts out of date. Always keep an SDR review path for at least the C tier.
Your “rubric” is vibes. The skill refuses to score against a rubric that has no explicit weights and tier values. If your team has not had the argument about what an A-tier industry actually is, have that argument first. The skill cannot make the rubric defensible if the source is not.
You need behavioral or intent scoring. This is fit scoring only. Trying to encode “engagement score” or “last website visit” into the rubric forces you to update it constantly; use a dedicated intent tool for the time-varying signals and keep this skill for the static fit ones.
You operate in a regulated domain that requires explainability beyond per-criterion rationale. Per-criterion outputs are auditable but they are not the same as a regulator-defensible model card. If you need that, invest in a proper scoring service, not a Claude Skill.

Setup

Setup takes about 30 minutes once you have the rubric drafted. The rubric itself takes longer — usually a 60-minute working session with the SDR manager, an AE, and someone from RevOps to argue about weights.

Install the Skill. Drop apps/web/public/artifacts/lead-scoring-icp-rubric-skill/SKILL.md and the references/ folder into your .claude/skills/lead-scoring/ directory (or upload as a Skill in claude.ai). The frontmatter name and description are what triggers the Skill on relevant prompts.
Replace the rubric template. Open references/1-icp-rubric-template.md and replace the placeholder rows in “Criteria” with your actual criteria, weights (1-5), and tier values (A / B / C). Fill the “Hard disqualifiers” section — these run as deterministic checks before any LLM call. Update “Last edited” so the SHA-256 the skill prints in every output footer reflects who owns the current version.
Replace the tier-to-action matrix. Open references/2-tier-to-action-matrix.md and replace the example rows with what your team actually does on each (tier, source_of_lead) combination. The defaults are reasonable but not yours.
Wire the input source. In Clay, point an AI column at the Skill, pass the enriched lead row as lead, the rubric file as rubric, and the source column as source_of_lead. In HubSpot, wrap the Skill in a custom-code action that reads the contact + company properties into a lead object and posts the structured output back. In a script, glob the CSV, post each row, write the score and rationale to two new columns.
Configure the destination. Both score and rationale go to the lead. Score in a number property (for routing logic), rationale in a long-text property (for the SDR who will read it before the call). Wire the escalate field to a separate boolean or enum property so the SDR manager can filter for review.
Calibrate. Before turning it on, run the skill over 20 closed-won leads and 20 closed-lost leads from the last 6 months. The score distribution should clearly separate the two cohorts. If it does not, the rubric is the problem, not the skill — go back to step 2 and re-argue weights.

What the skill actually does

The skill runs four steps in a fixed order. Earlier steps gate later steps; do not parallelize.

Step 1 — deterministic firmographic checks. Before any LLM call, plain code runs the rubric’s hard disqualifiers (sanctioned country, disqualified industry, headcount under your floor, free-mail domain) and the required-field check (email and company_domain must be present). Hits return immediately — disqualified with the citation, or escalate: insufficient_data with the missing fields. Why deterministic first: it is free, fast, and never hallucinates. Burning tokens to confirm a 3-person hairdresser is not in your enterprise-SaaS ICP is wasteful.

Step 2 — per-criterion LLM scoring with explicit weighting. For each remaining criterion, the model emits a tier (A / B / C) and a one-sentence rationale citing the rubric row. The skill multiplies tier (A=3, B=2, C=1) by the criterion’s weight and sums. Why per-criterion rather than a holistic prompt: holistic outputs blend criteria silently and you lose the ability to debug why a lead got an 8 instead of a 5. Why explicit weighting rather than letting the model balance: stated weights are the only way the rubric stays the source of truth. If the model decides its own balance, rubric reviews become theatre.

Step 3 — borderline fallback to human review. If the final score is within 0.5 of a tier boundary, or if more than 3 criteria were scored on missing or inferred data, the skill sets escalate: needs_human_review and names the missing fields. The most expensive scoring failure is not a wrong tier on a confident lead — it is a wrong tier on a lead that was always borderline.

Step 4 — output assembly. The skill emits the markdown described in references/3-sample-output.md: headline score and tier, recommended next action joined from the tier-to-action matrix, per-criterion table with reasons, disqualifier check, data-gaps list, and a footer with the rubric’s SHA-256 and last-edited date.

Cost reality

Per-lead token cost depends on rubric size, but for a typical 6-criterion rubric with structured per-criterion output, expect roughly 1,500-2,500 input tokens and 400-700 output tokens per lead. At Claude Sonnet 4.x pricing (approximately $3 per million input and $15 per million output as of late 2026), that is around $0.01-0.02 per scored lead.

A team running 5,000 inbound MQLs per month spends roughly $50-100/month in Claude tokens. A team running 50,000 enriched outbound leads per month spends $500-1,000/month — at which point batching, prompt caching of the rubric, and pre-filtering with the deterministic step matter a lot. The skill defaults to a single structured prompt per lead (rather than 6-10 small prompts) precisely to keep token usage bounded.

The non-token costs are bigger. Building the rubric is a 60-minute working session you do once and re-do quarterly. Calibrating against 20 closed-won + 20 closed-lost leads is another hour. Wiring the Clay or HubSpot integration is half a day. After that the skill is hands-off until the rubric drifts.

Success metric

The metric to watch is score-to-conversion correlation: of the leads scored A in the last 90 days, what fraction converted to opportunities? Of those scored B? C? If the curve is monotonic — A converts at a higher rate than B, B at a higher rate than C — the rubric is doing work. If C converts at a similar rate to B, the rubric does not separate fit from non-fit and needs to be re-argued.

Secondary metric: SDR time-to-first-touch on A-tier leads. A working scoring system collapses this to under 1 hour for inbound. If A-tier leads still sit in a queue for 24h, the routing — not the scoring — is the bottleneck.

vs alternatives

vs HubSpot Predictive Lead Scoring. HubSpot’s built-in predictive score is a black box trained on your historical conversion data. It works once you have enough closed-won volume (HubSpot recommends about 500 closed deals as a minimum). For teams under that bar, the model has nothing to learn from and the score is noise. This skill works from day one because the rubric is hand-authored, not learned. The trade-off: HubSpot’s model picks up patterns a rubric author would miss; this skill only knows what you wrote down. Run both if you have the volume — use HubSpot’s score for “what surprises me” and this skill’s per-criterion rationale for “why is this one ranked here.”

vs Marketo behavioral scoring. Marketo (or HubSpot’s behavioral scoring) tracks engagement signals — email opens, page views, form submissions — and adds points. That is intent scoring, not fit scoring, and the two answer different questions. A great-fit account that has not opened an email is still a great-fit account. A poor-fit account that binge-read your blog is still a poor-fit account. Use behavioral scoring in addition to this skill, not instead of it; route on the combined signal (high fit + high intent → AE direct; high fit + low intent → nurture; low fit + high intent → SDR fit-call before AE).

vs manual SDR review. For under 50 inbound leads per week, manual review by an SDR manager is genuinely competitive — humans pick up nuance (“this company just acquired our customer, prioritize”) that the skill will miss. Above ~200 leads per week, manual review becomes the bottleneck and consistency drops. The skill scales linearly with token budget; humans do not.

Watch-outs

Rubric drift. Someone edits the markdown rubric, ships the change, and SDRs reading the new scores never see a diff. Six weeks later, the team realizes the headcount weight got bumped from 4 to 2 by accident and 200 stretch-tier accounts were silently downgraded to C. Guard: the skill records the rubric’s SHA-256 in every output footer and prepends a “Rubric updated YYYY-MM-DD” banner whenever the hash changes between runs. A quarterly calendar reminder forces a review even if no edits happen.
Source-bias amplification. A rubric built from your closed-won set encodes who you have already sold to. Scoring against it makes you blind to adjacent ICP and your pipeline narrows over time to lookalikes of last year’s customers. Guard: every quarter, sample 20 leads the skill scored as C-tier and have an AE manually review whether any are actually fit. If more than 3 are misclassified, add a “stretch ICP” row to the rubric and recalibrate.
False confidence on thin data. When enrichment is missing 4 of 6 criteria fields, a 7.4 score is mostly noise but reads as authoritative. SDRs will treat it as a confident A-tier and skip the call prep. Guard: the skill sets escalate: needs_human_review whenever more than 3 criteria are scored on missing or inferred data, and adds a “Data gaps” section listing the absent fields. SDRs are trained to read the gaps section before the headline number.
Protected-class proxies. Even with good intent, a rubric that weights “geography” can collapse into nationality, and “industry” can collapse into proxies for company demographics in ways your legal team will not love. Guard: the skill refuses fields it recognizes as protected-class proxies (name-derived gender, photo, age signals). Review the rubric annually with someone who can spot the less obvious proxies.

Stack

Claude — scoring engine and rationale generator. Sonnet 4.x is the sweet spot for cost vs reasoning quality on this task; Haiku works for the deterministic-only path but loses rationale quality on the LLM step.
Clay — preferred lead-source and enrichment layer for outbound and cold-list scoring. The AI column is a clean integration point.
HubSpot — CRM destination for score, rationale, escalate flag, and source. Custom-code actions are the integration point for inbound MQL scoring.
A markdown editor and a calendar — the unglamorous pieces. The rubric lives in markdown, the quarterly review lives in someone’s calendar, and both matter more than the model choice.

Edit this page on GitHub

Files in this artifact

Download all (.zip)

---
name: lead-scoring-icp-rubric
description: Score a single lead or a batch of leads against an explicit ICP rubric. Returns a 0-10 score per lead, a per-criterion rationale citing the rubric, a recommended next action by tier, and an escalation flag for borderline cases. Use when triaging inbound or routing enriched outbound leads — not as a substitute for behavioral or intent-based scoring.
---

# Lead scoring (ICP rubric)

## When to invoke

Invoke whenever you need to score a single lead — or a CSV/JSON batch of leads — against your team's ICP rubric. Typical entry points: a Clay table column, a HubSpot custom-code action firing on a new MQL, a standalone CLI run over a marketing-list export, or a manual paste during deal-desk triage.

The skill takes structured lead data plus the rubric and returns a 0-10 score, per-criterion rationale, a recommended next action by tier, and an escalation flag when the data is too thin to score confidently.

Do NOT invoke this skill for:

- **Auto-rejecting leads.** The output is a recommendation. Disqualifying a lead from outreach without an SDR seeing the rationale silently destroys pipeline when the rubric is wrong (and the rubric is sometimes wrong).
- **Scoring on protected-class proxies.** Do not pass fields like name-derived gender, photo, age, country-of-origin signals. Even if your rubric weights "geography" legitimately for support-hours fit, never collapse that into ethnicity or nationality. The skill refuses fields it recognizes as protected-class proxies.
- **Replacing intent-based or behavioral scoring entirely.** This is fit scoring, not intent. A great-fit account that has not visited your pricing page in 90 days is still a great fit but not a hot lead. Pair this skill with whatever signals "they are in-market right now" — Bombora, 6sense, your own product-usage events.

## Inputs

Required:

- `lead` — a structured lead record. Minimum fields: `email`, `company_domain`. Strongly preferred: `headcount`, `industry`, `country`, `job_title`, `tech_stack` (array), `funding_stage`. Pass whatever your enrichment layer (Clay, Apollo, ZoomInfo, Clearbit) returns.
- `rubric` — path to or inline contents of the ICP rubric markdown (see `references/1-icp-rubric-template.md`). Must contain explicit criterion + weight + tier-value rows. The skill refuses to score against a rubric that has no weights — vibes are not a rubric.

Optional:

- `source_of_lead` — free-text or enum: `inbound_demo`, `inbound_content`, `outbound_sequence`, `partner_referral`, `event`, `cold_list`. Used to bias the recommended-next-action mapping (a partner referral with a B-tier score still gets a human reach-out; a cold-list lead at the same tier does not).
- `batch_size_hint` — when scoring more than one lead, the caller can pass an integer so the skill paces token usage and returns progress markers. Default: process serially, no progress markers.

## Reference files

Always load these from `references/` before scoring. They are the leverage point — a tight rubric makes a defensible score, a vague rubric makes a vibes score that an AE will (correctly) ignore.

- `references/1-icp-rubric-template.md` — the rubric template. Replace placeholder rows with the actual criteria, weights, and tier values your team has agreed on.
- `references/2-tier-to-action-matrix.md` — maps the four tiers (A / B / C / disqualified) and the `source_of_lead` enum to a recommended next action. Edit this once with your team's routing reality, not the defaults.
- `references/3-sample-output.md` — a literal example of the markdown the skill produces, for one fictional lead. Use as the reference when wiring downstream parsers.

## Method

The skill runs these steps in order. Earlier steps gate later steps — do not parallelize.

### 1. Deterministic firmographic checks (no LLM)

Before any LLM call, run plain code over the lead record:

- Hard disqualifiers from the rubric (e.g. `country in ["{sanctioned-country}"]`, `industry in {disqualified-industries}`, `headcount < 10` if the rubric sets that floor) → return tier `disqualified` with the citation, no LLM call.
- Required-field check: if `email` and `company_domain` are missing, return `escalate: insufficient_data`.

Why: deterministic checks are free, fast, and never hallucinate. Burning tokens to confirm that a 3-person hairdresser is not in your enterprise-SaaS ICP is wasteful and slightly embarrassing.

### 2. Per-criterion LLM scoring with explicit rubric weighting

For each remaining criterion in the rubric, prompt the model to produce a tier value (A / B / C) and a one-sentence rationale that cites the rubric row. The skill multiplies the tier-value (A=3, B=2, C=1) by the criterion's weight and sums.

Why per-criterion rather than one holistic prompt: holistic scoring blends criteria silently and you lose the ability to debug why a lead got an 8 instead of a 5. Per-criterion outputs make the score auditable. The cost is roughly 6-10 short prompts per lead (or a single prompt that emits a structured per-criterion response — both work; the skill defaults to a single structured prompt with explicit per-criterion fields to keep tokens down).

Why explicit weighting rather than "let the model balance them": stated weights are the only way the rubric stays the source of truth. If the model invents its own balance, the rubric stops being authoritative and rubric reviews become theatre.

### 3. Borderline case fallback to human review

If the final score is within `+/- 0.5` of a tier boundary, OR if the rubric has more than 3 criteria where the data was missing/insufficient, set `escalate: needs_human_review` with a note naming the missing fields.

Why: the most expensive scoring failure is not a wrong tier on a confident lead — it is a wrong tier on a lead that was always borderline. Surfacing those for human review preserves trust in the confident scores.

### 4. Output assembly

Render the markdown described in "Output format" below. Score is the headline number. Rationale is the per-criterion table. Next action comes from the tier-to-action matrix, joined with `source_of_lead` if provided. Escalation flag is surfaced at the top when set.

## Output format

Literal markdown the skill emits for a single lead:

```markdown
# Lead score — jane.doe@acme.com (acme.com)

**Score:** 7.4 / 10 — Tier B
**Source:** inbound_content
**Escalate:** no

## Recommended next action

Tier B + inbound_content → SDR personalized email within 24h, no auto-sequence. Reference content piece they engaged with.

## Rationale (per criterion)

| Criterion | Weight | Tier | Reason |
|---|---|---|---|
| Industry | 5 | A | "Vertical SaaS / RevOps" matches in-ICP row in rubric. |
| Headcount | 4 | B | 240 employees — in stretch range (200-500), not core (500-2000). |
| Geo | 3 | A | HQ US-east, in supported region. |
| Tech stack | 4 | B | Salesforce + Marketo present (fit signals); no data warehouse cited. |
| Funding stage | 2 | C | Bootstrapped — out of preferred Series B-D band. |
| Job title | 4 | A | "Director, RevOps" matches champion-target pattern. |

## Disqualifier check

None triggered.

## Data gaps

- `revenue` field not provided by enrichment.
```

For batch input, the skill emits one such block per lead, separated by `\n---\n`, plus a top-level summary table (`email | tier | escalate`).

## Watch-outs

- **Rubric drift.** The rubric is a markdown file that someone edits. Edits are silent — no diff is shown to the SDRs reading scores. **Guard:** the skill records the rubric's SHA-256 in every output footer and prepends a "Rubric updated {date}, last verified by {name}" line if the hash differs from the previous run's. A weekly job (or a calendar reminder, if you are not that fancy) opens a PR-style review of the rubric every quarter.
- **Source-bias amplification.** If the rubric was built from your closed-won set, it encodes who you have already sold to. Repeatedly scoring against it narrows your pipeline to lookalikes and makes you blind to adjacent ICP. **Guard:** every quarter, sample 20 leads the skill scored as C-tier and have an AE review whether any are actually fit. If more than 3 are misclassified, the rubric is over-fit and needs a "stretch ICP" row added.
- **False confidence on thin data.** When enrichment is missing 4 of the 6 criteria fields, a 7.4 score is mostly noise. **Guard:** the skill sets `escalate: needs_human_review` whenever more than 3 criteria are scored on missing/inferred data, and adds a "Data gaps" section listing the absent fields. SDRs are trained to read the gaps section before the headline number.

# ICP rubric — TEMPLATE

> Replace this template's contents with your team's actual ICP rubric.
> The lead-scoring skill scores each criterion against this rubric. Vague
> rows (no weights, no tier values) cause the skill to refuse the run.

## How the skill reads this file

- Each row in "Criteria" must have an explicit `weight` (1-5) and three tier values (A / B / C). Anything else is treated as malformed and the skill returns an error rather than guessing.
- Rows in "Hard disqualifiers" run as deterministic checks before any LLM call. Keep them tight; one wrong row here silently kills good pipeline.
- The "Last edited" line is hashed into the SHA-256 the skill records in every output footer. Update it when you make material changes so SDRs reading scores can see the rubric moved.

## Criteria

| Criterion | Weight | A (best fit) | B (stretch) | C (poor fit) |
|---|---|---|---|---|
| Industry | 5 | {industries you win in} | {adjacent industries} | {everything else} |
| Headcount | 4 | {core range, e.g. 500-2000} | {stretch range, e.g. 200-500 or 2000-5000} | {below/above stretch} |
| Geo | 3 | {primary regions} | {secondary regions} | {regions you do not support} |
| Tech stack | 4 | {tools that signal fit, e.g. Salesforce + Marketo} | {one of the fit tools present} | {competing system of record} |
| Funding stage | 2 | {preferred stages, e.g. Series B-D} | {adjacent stages} | {unfit, e.g. pre-seed or post-IPO} |
| Job title | 4 | {champion-target patterns} | {adjacent titles} | {non-buying-committee titles} |

## Hard disqualifiers

Single signals that drop a lead to `disqualified` regardless of other criteria. Run as deterministic checks before LLM scoring.

- `country in [{sanctioned-or-unsupported-list}]`
- `industry in [{disqualified-industries — e.g. adult, gambling if you do not serve them}]`
- `headcount < {floor — e.g. 10}` (if you have a floor)
- `email_domain in [{free-mail providers if your motion blocks them}]`

## Tier thresholds

The skill maps the weighted sum to a tier. Defaults shown — adjust to your team's calibration run.

| Score | Tier |
|---|---|
| 8.0 - 10.0 | A |
| 6.0 - 7.99 | B |
| 4.0 - 5.99 | C |
| < 4.0 | disqualified |

## Last edited

{YYYY-MM-DD} — by {name}

# Tier-to-action matrix — TEMPLATE

> Replace this template's contents with your team's actual routing reality.
> The lead-scoring skill joins the score's tier with the lead's
> `source_of_lead` to pick a recommended next action. Edit once with your
> SDR/AE manager so the recommendations match what your reps actually do.

## How the skill reads this file

- Rows are `(tier, source_of_lead) → action`. The skill picks the row whose tier matches the score and whose source matches the input. If the source is missing or unrecognized, it falls back to the row marked `*` (any source).
- An action is one short imperative sentence. The skill emits this verbatim under "Recommended next action" — keep it copy-pasteable.

## Matrix

| Tier | Source | Action |
|---|---|---|
| A | inbound_demo | Round-robin to AE within 5 minutes; book meeting in same business day. |
| A | inbound_content | SDR call within 1 hour; reference content piece. Auto-sequence as backup if no answer in 24h. |
| A | outbound_sequence | Move to high-touch sequence; SDR adds 2 personalized steps. |
| A | partner_referral | AE handles directly. Loop in partner manager for warm intro. |
| A | event | SDR call within 24h referencing the event session/booth conversation. |
| A | cold_list | Treat as outbound: enrich further, hand to SDR for personalized first touch. |
| A | * | SDR personalized outreach within 24h. |
| B | inbound_demo | SDR qualification call within 4 hours before AE handoff. |
| B | inbound_content | SDR personalized email within 24h, no auto-sequence. Reference content piece. |
| B | outbound_sequence | Standard outbound sequence, no escalation. |
| B | partner_referral | SDR call within 48h; loop in partner if no response. |
| B | event | SDR email + follow-up call within 48h. |
| B | cold_list | Standard outbound sequence. |
| B | * | SDR email within 48h. |
| C | inbound_demo | SDR fit-call within 24h; many will self-disqualify on the call. |
| C | inbound_content | Add to nurture; no SDR touch unless engagement signals appear. |
| C | outbound_sequence | Pause sequence; do not waste SDR cycles. |
| C | partner_referral | SDR courtesy call within 1 week (relationship cost of ignoring). |
| C | event | Add to nurture only. |
| C | cold_list | Drop. |
| C | * | Nurture only. |
| disqualified | * | Mark `Disqualified — out of ICP` with rubric citation. Do not auto-delete; archive for audit. |

## Escalation overrides

When the skill emits `escalate: needs_human_review`, the action above is replaced with:

> Hold for SDR manager review. Lead is borderline (within 0.5 of tier boundary) or scored on thin data. See "Data gaps" section.

When the skill emits `escalate: insufficient_data`, the action is:

> Re-enrich lead and re-score. Required fields missing: {list}.

## Last edited

{YYYY-MM-DD} — by {SDR manager name}

# Sample output — for parser wiring

> A literal example of what the skill emits for one fictional lead. Use
> this when wiring the downstream parser (Clay AI column → property
> mapping, HubSpot custom-code action → property writeback, CSV
> post-processor). The schema below is what the skill commits to; the
> values are illustrative.

## Single-lead output

```markdown
# Lead score — jane.doe@northwind.com (northwind.com)

**Score:** 7.4 / 10 — Tier B
**Source:** inbound_content
**Escalate:** no

## Recommended next action

Tier B + inbound_content → SDR personalized email within 24h, no auto-sequence. Reference content piece they engaged with.

## Rationale (per criterion)

| Criterion | Weight | Tier | Reason |
|---|---|---|---|
| Industry | 5 | A | "Vertical SaaS / RevOps" matches in-ICP row in rubric. |
| Headcount | 4 | B | 240 employees — in stretch range (200-500), not core (500-2000). |
| Geo | 3 | A | HQ US-east, in supported region. |
| Tech stack | 4 | B | Salesforce + Marketo present (fit signals); no data warehouse cited. |
| Funding stage | 2 | C | Bootstrapped — out of preferred Series B-D band. |
| Job title | 4 | A | "Director, RevOps" matches champion-target pattern. |

## Disqualifier check

None triggered.

## Data gaps

- `revenue` field not provided by enrichment.

---

_Rubric SHA-256: 4f9c...a812 | Last edited 2025-12-15 by Sam Patel_
```

## Batch output

For a batch of N leads, the skill prepends a summary table and emits one block per lead separated by `\n---\n`:

```markdown
# Batch summary (12 leads)

| Email | Tier | Score | Escalate |
|---|---|---|---|
| jane.doe@northwind.com | B | 7.4 | no |
| ahmed@tailspintoys.io | A | 8.9 | no |
| j.smith@gmail.com | disqualified | 0 | hard_disqualifier:free_email |
| ... | ... | ... | ... |

---

# Lead score — jane.doe@northwind.com (northwind.com)
...
---
# Lead score — ahmed@tailspintoys.io (tailspintoys.io)
...
```

## Field contract for parsers

If you write a parser instead of consuming the markdown, these are the stable fields:

- `email` — string, lowercased
- `domain` — string, lowercased
- `score` — float, 0.0 to 10.0, one decimal
- `tier` — enum: `A` / `B` / `C` / `disqualified`
- `source` — pass-through of the input `source_of_lead`, or `unknown`
- `escalate` — enum: `no` / `needs_human_review` / `insufficient_data` / `hard_disqualifier:{reason}`
- `next_action` — string, single sentence
- `rationale[]` — list of `{criterion, weight, tier, reason}`
- `data_gaps[]` — list of strings (field names)
- `rubric_sha256` — string, 8-character prefix in the markdown footer; full hash available via the skill's structured-output mode