claude-skill

Catch hallucinated claims, generic personalization, and compliance breaks in AI SDR drafts before they send

Difficulty

intermédiaire

Setup time

60-90 min

For

revops · sdr-leader · gtm-engineer

RevOps

Stack

Un Claude Skill qui se positionne entre un AI SDR (Alice chez 11x, Ava chez Artisan, l’agent intégré à aisdr ou Unify) et l’action d’envoi, notant chaque draft contre quatre rubriques — exactitude des claims, ancrage de la personnalisation, conformité juridictionnelle, et hygiène de deliverability — et renvoyant un verdict block / edit / send avec l’axe spécifique défaillant cité. Le bundle dans apps/web/public/artifacts/ai-sdr-draft-qa-skill/ livre SKILL.md, quatre fichiers de rubrique dans references/, et un fichier littéral de sample output pour le wiring du parser.

Quand l’utiliser

Faites tourner ce skill comme gate pré-envoi sur tout déploiement d’AI SDR qui envoie sans revue humaine message par message. Les deux patterns en production : un webhook devant l’action d’envoi de l’AI SDR qui poste le draft plus le pack d’évidence du prospect au skill et ne libère l’envoi que sur une réponse verdict: send, ou une passe batch pré-envoi sur les 24 prochaines heures de drafts en file qui met en pause toute étape de séquence avec verdict: block.

Le skill est aussi un outil de calibration pendant le pilote. Passez un échantillon de 500 drafts de votre premier mois avec 11x, Artisan ou aisdr à travers le skill, puis faites étiqueter les mêmes 500 à la main par un analyste RevOps. L’ensemble des désaccords vous dit si l’AI SDR sur ou sous-personnalise pour votre ICP, où se concentre le taux de claims hallucinés, et si votre profil juridictionnel a besoin d’un ajustement avant de scaler le volume d’envoi au-delà de 5 000 par semaine.

Le skill exige le draft plus un pack prospect_evidence — le même payload d’enrichment que l’AI SDR a utilisé pour écrire le draft. Si l’AI SDR upstream n’expose pas le pack d’évidence (certaines suites fermées le cachent), le skill ne peut pas vérifier les claims et renvoie insufficient_evidence plutôt que de deviner. C’est une feature, pas un bug : un gate de QA qui note les drafts contre la connaissance générale du modèle hallucinera ses propres validations.

Quand NE PAS l’utiliser

N’utilisez pas ce skill quand un SDR ou AE humain revoit chaque draft avant l’envoi. Le reviewer est un gate plus fort que le skill — il possède le contexte business que le skill n’a pas — et placer un modèle devant un reviewer humain gaspille des tokens et ajoute de la latence sans relever la précision. Utilisez-le pour les flux totalement ou partiellement autonomes.

Ne l’utilisez pas comme unique contrôle de deliverability. Le skill scanne les phrasings déclencheurs de spam, les subjects en majuscules, les bodies tout-image et les patterns de cloaking de liens à l’intérieur du draft. Il ne surveille pas DMARC, le complaint rate, ni le statut de blocklist sur vos domaines — c’est le job du workflow email-deliverability-monitor-n8n. Faites tourner les deux.

Ne le faites pas tourner sur des drafts de réponse tiède ni sur des threads déjà engagés. Les rubriques sont conçues pour de l’outbound froid ; un draft de réponse à un prospect qui a déjà réservé un meeting échouera à la rubrique de personnalisation par design (la personnalisation doit maintenant être context-aware, pas tirée d’évidence froide). Routez les drafts de tier tiède vers un autre prompt.

Setup

Le setup prend 60-90 minutes pour le skill lui-même, plus le temps de wiring upstream, qui dépend de si votre AI SDR expose un webhook pré-envoi.

Installez le Skill. Déposez apps/web/public/artifacts/ai-sdr-draft-qa-skill/SKILL.md et le dossier references/ dans votre répertoire .claude/skills/ai-sdr-draft-qa/, ou uploadez-le comme Skill sur claude.ai. Les champs name et description du frontmatter sont ce qui déclenche le Skill depuis un agent appelant.
Calibrez la rubrique de claims. Ouvrez references/1-claim-rubric.md et fixez claim_block_threshold — le nombre de claims non vérifiés qui déclenche un verdict block (default : 1). La plupart des AI SDRs sur-confabulent les rounds de funding et le headcount ; le default de 1 fait remonter chaque claim halluciné. Montez à 2 seulement si vous acceptez un risque d’hallucination en échange de moins de blocks.
Calibrez la rubrique de personnalisation. Ouvrez references/2-personalization-rubric.md. Le scoring default utilise une échelle 0-5 ; le personalization_block_below default est 2. Un score de 2 signifie au moins une spécificité ancrée liée au pack d’évidence. Les drafts qui notent 0 ou 1 sont des templates du type « Bonjour [first_name], j’ai remarqué que [Company] est dans le domaine [industry] » — bloquez.
Choisissez les profils juridictionnels. Ouvrez references/3-compliance-rubric.md et activez les profils qui correspondent à votre envoi. US CAN-SPAM + RFC 8058 one-click unsubscribe est le plancher ; la documentation de la base d’intérêt légitime RGPD UE est la couche pour tout recipient UE ; la France ajoute la Loi Hamon pour le B2B ; la Californie ajoute un opt-out aligné CCPA. La rubrique de conformité lit le pays du prospect depuis le pack d’évidence et applique le profil correspondant ou renvoie insufficient_compliance_context.
Câblez le webhook pré-envoi. Pour 11x et Artisan, configurez le webhook pré-envoi dans les settings de la plateforme avec l’URL de votre endpoint (ou utilisez le mode « approval queue » de la plateforme et faites conduire les approvals par le skill). Pour Unify et aisdr, utilisez l’API ouverte de la plateforme pour récupérer le prochain draft en file, appeler le skill et écrire le verdict en retour. Pour un agent maison, placez le skill directement devant l’appel SMTP d’envoi.
Décidez de la policy de block. Un verdict block peut router le draft vers un reviewer humain, le retenir pour que l’AI SDR le régénère, ou faire un hard-fail de l’envoi. Le default est « retenir pour régénération avec l’axe défaillant en feedback » — la plupart des AI SDRs améliorent le draft au second pass quand on leur donne la défaillance spécifique.

Ce que le skill fait vraiment

Étape 1 — validation d’input. Le skill rejette les appels auxquels manque le body du draft, le subject, le sender domain, le pays du recipient ou le pack prospect_evidence. L’absence de l’un d’eux renvoie insufficient_input avec le champ spécifique. Aucun scoring ne tourne sur un record incomplet.

Étape 2 — extraction et vérification des claims. Chaque claim factuel sur le prospect, l’entreprise du prospect, ou un événement public (« j’ai vu votre annonce de Série B mardi dernier », « le spike d’embauches dans votre équipe data ») est extrait, puis confronté au pack d’évidence. Un claim est ancré si une citation dans le pack le soutient. Les claims non ancrés sont marqués. Default claim_block_threshold: 1 — un claim non ancré déclenche un block.

Étape 3 — scoring de personnalisation. Le skill note 0-5 sur les spécificités ancrées. Une spécificité ancrée est un détail lié à une citation dans le pack d’évidence — un tool nommé que le prospect utilise, un job posting spécifique qu’il a publié, un podcast dans lequel il est apparu. Une spécificité non ancrée — « votre industrie », « votre rôle », « votre équipe » — ne compte pas. Les drafts qui notent en dessous de personalization_block_below: 2 sont bloqués. La séparation à deux pôles (ancrée vs non ancrée) est ce qui empêche l’AI SDR de gamefier le score en bourrant des tokens.

Étape 4 — scan de conformité. Le skill vérifie : un pattern de header List-Unsubscribe et une ligne List-Unsubscribe-Post: List-Unsubscribe=One-Click selon RFC 8058 (l’exigence de bulk-sender Google et Yahoo depuis février 2024), une adresse physique d’expéditeur en footer selon CAN-SPAM, un lien d’unsubscribe dans le body visible, une identité d’expéditeur qui matche la ligne From, et les ajouts par juridiction des profils activés. L’absence d’un élément requis est un block.

Étape 5 — scan de deliverability et de voix. Le skill marque le langage déclencheur de spam (« guaranteed », « free money », « act now »), les subject lines au-dessus de 70 caractères ou en majuscules, les bodies sous 40 mots ou au-dessus de 250 mots, les bodies tout-image, plus de 3 liens, et les tells AI stock (« I hope this email finds you well », « I wanted to reach out »). Une marque déclenche un verdict edit, pas un block, à moins qu’elle ne s’empile avec une autre marque.

Étape 6 — assemblage du verdict. Le skill renvoie l’un de trois verdicts : send (pas de blocks, pas d’edits), edit (une ou plusieurs marques tier-edit avec les rewrites suggérés inline), ou block (un ou plusieurs problèmes bloquants avec l’axe défaillant nommé). Le format de sortie est dans references/4-sample-output.md.

Réalité de coût

Chaque passe de QA consomme 1 500-3 500 tokens d’input (le draft, le pack d’évidence et les quatre fichiers de rubrique quand non cachés) et 400-800 tokens d’output. Au pricing de Claude Sonnet 4.x (environ 3$ par million d’input et 15$ par million d’output, list de mid-2026), chaque passe coûte 0,01-0,03$.

À volume d’AI SDR — un agent autonome unique faisant 5 000-15 000 envois par mois — la couche de QA coûte 50-450$ par mois en tokens Claude. À un déploiement de 50 000 envois par mois (plusieurs agents, envoi multi-domaines), 500-1 500$. Comparez à l’alternative : un domaine d’envoi supprimé suite à un spike de 0,3% de complaint rate coûte de 5 à 10 jours ouvrés de pipeline. Le coût de QA est une erreur d’arrondi contre une mauvaise semaine.

Le prompt caching des fichiers de rubrique coupe le coût des tokens d’input de 30-50% en volume de production. Le SKILL.md du bundle documente la convention de cache-key ; les quatre fichiers de rubrique sont stables entre appels au sein d’un déploiement.

Métrique de succès

La métrique à tracker est le taux de capture des claims hallucinés : échantillonnez 100 drafts par semaine, faites étiqueter chacun par un analyste RevOps sur les claims non ancrés, et mesurez le recall du skill contre les labels de l’analyste. Un recall au-dessus de 95% signifie que la rubrique fonctionne ; en dessous de 90% signifie que la rubrique de claims a besoin d’un serrage (baissez le threshold, ou élargissez ce qui compte comme « claim »).

Métrique secondaire : taux de block faux. Parmi les drafts que le skill a bloqués, comptez la part qu’un analyste aurait approuvée. Un taux de block faux au-dessus de 8% est le signal pour desserrer le threshold de personnalisation de 2 à 1 ou élargir la définition de spécificité ancrée. En dessous de 3% signifie que le skill sous-bloque — poussez le threshold dans l’autre sens.

Les deux métriques se déplacent en sens opposé ; choisissez le point d’opération qui correspond à votre tolérance. Une équipe B2B enterprise vendant à Fortune 500 devrait tourner serré — recall élevé, accepter plus de block faux. Une équipe SMB à fort volume vendant 10 000+ par semaine devrait tourner lâche — moins de block faux, accepter quelques claims hallucinés si le calcul de volume tient.

vs alternatives

vs pas de QA. Le statu quo pour les déploiements d’AI SDR totalement autonomes jusqu’en 2026 est l’absence de gate pré-envoi au-delà des guardrails légers du vendor lui-même. Les taux de réponse sur les envois autonomes se situent à 1-3% contre 8-15% sur les pods hybrides AI-plus-humain (estimations de déploiements rapportés par des buyers jusqu’à mi-2026, pas un benchmark publié unique). Les patterns de claim halluciné et de personnalisation générique sont une part matérielle de l’écart. Ajouter un gate de QA monte le taux, mais le mouvement est borné — de meilleurs drafts ne transforment pas une liste froide en liste tiède.

vs les guardrails internes de l’AI SDR. 11x et Artisan livrent des vérifications de qualité internes qui signalent les défaillances évidentes, mais la surface de défaillance n’est pas transparente — vous ne pouvez pas inspecter ce que la vérification a attrapé ou non, et vous ne pouvez pas tuner le threshold. Ce skill rend la rubrique inspectable. Le trade-off : c’est un appel modèle séparé avec son propre coût de latence.

vs un reviewer SDR humain. Un reviewer humain attrape les défaillances de contexte business que le skill rate (« ce prospect vient d’avoir un gros outage, n’envoyez pas un email guilleret aujourd’hui »). Le skill attrape les défaillances de cohérence que le reviewer humain rate sur le draft 200 de la journée. Faites tourner les deux à haute valeur de deal ; le skill seul à haut volume.

vs un prompt structuré qui contraint l’AI SDR upstream. Des prompts upstream plus serrés réduisent l’hallucination à la source. Ils n’attrapent pas le taux résiduel et ne signalent pas les ruptures de conformité juridictionnelle (la juridiction dépend du recipient, que le prompt d’écriture ne connaît pas). Utilisez les deux : un prompt upstream structuré pour l’AI SDR, plus ce skill comme gate.

Watch-outs

Faux blocks sur les spécificités légitimes tirées par l’AI. Si l’AI SDR upstream a récupéré un press release récent que le pack d’évidence n’inclut pas, le skill marque le claim comme non ancré et bloque. Guard : le skill vérifie uniquement contre le pack d’évidence fourni, jamais contre la connaissance du modèle. Le contrat est que l’AI SDR inclut dans le pack tout ce qu’il a utilisé pour écrire le draft ; s’il ne peut pas, le skill ne peut pas vérifier. Le fix est upstream — faire que le vendor de l’AI SDR expose le contexte de retrieval — pas un desserrage de la rubrique.
Gaming du score de personnalisation. Un skill qui récompense la spécificité apprend au modèle upstream à bourrer des tokens d’apparence spécifique. « Votre travail chez Snowflake sur la plateforme data » se lit comme personnalisé même si le prospect a quitté l’entreprise depuis 18 mois. Guard : la rubrique note les spécificités ancrées et non ancrées séparément. Une entité nommée ne compte que si une citation du pack d’évidence la soutient ; une spécificité périmée sans citation d’emploi actuel se lit comme non ancrée.
Compliance creep entre juridictions. CAN-SPAM, RFC 8058, RGPD, Loi Hamon française, opt-out aligné CCPA en Californie, awareness NYC LL144 pour tout outreach adjacent à l’embauche — règles différentes par recipient. Guard : la rubrique de conformité est par juridiction ; le pack prospect_evidence doit inclure le pays du recipient (et l’État américain quand pertinent), et le skill applique le profil correspondant ou renvoie insufficient_compliance_context. Le repli silencieux sur un profil « global » générique est interdit dans la rubrique.
Le skill devient le goulot. À 50 000 envois par mois et un p95 de 3 secondes par draft, le gate de QA ajoute environ 42 heures de wall-clock par mois de traitement sériel — bien en parallèle, mauvais en thread unique. Guard : le bundle documente le pattern de parallélisation (un appel Claude par draft, batches de 20-50 en vol) et la convention de cache-key pour les quatre fichiers de rubrique. Visez un p95 sous 3 secondes par draft ; alertez quand le p95 dépasse 5 secondes.

Bundle de référence

apps/web/public/artifacts/ai-sdr-draft-qa-skill/SKILL.md — définition complète du skill, inputs, méthode, format de sortie et watch-outs.
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/1-claim-rubric.md — ce qui compte comme claim, contrat du pack d’évidence, thresholds pass/block par axe.
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/2-personalization-rubric.md — spécificités ancrées vs non ancrées, scoring 0-5 avec outputs d’exemple à chaque score.
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/3-compliance-rubric.md — profils par juridiction (US CAN-SPAM, RFC 8058 one-click unsubscribe, RGPD UE intérêt légitime, NYC LL144 awareness, Loi Hamon française, opt-out aligné CCPA en Californie).
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/4-sample-output.md — outputs littéraux send, edit et block plus contrat de champs structurés pour parsers.

Modifier cette page sur GitHub

Files in this artifact

Download all (.zip)

---
name: ai-sdr-draft-qa
description: Pre-send QA gate for AI SDR drafts (11x Alice, Artisan Ava, aisdr, Unify, homegrown agents). Scores each draft on claim accuracy, personalization grounding, jurisdictional compliance, and deliverability hygiene, then returns a block / edit / send verdict with the specific failing axis cited and an optional rewritten draft. Use as a webhook in front of the AI SDR's send action — not as a substitute for a human reviewer on warm or already-engaged threads.
---

# AI SDR draft QA

## When to invoke

Invoke before any AI-SDR-generated outbound email is released to the send queue. Production patterns:

- A pre-send webhook in 11x, Artisan, aisdr, or Unify that posts `{ draft, prospect_evidence, sender_domain }` to this skill and only releases the send on `verdict: send`.
- A batch pre-send pass over the next 24 hours of queued drafts that pauses any sequence step with `verdict: block`.
- A calibration pass during AI SDR pilot — run 500 drafts through the skill, have a RevOps analyst label the same 500 by hand, use the disagreement set to tune the rubric thresholds before scaling.

Do NOT invoke this skill for:

- **Warm or already-engaged threads.** Replies to a prospect who already booked a meeting will fail the personalization rubric by design — the personalization should be context-aware, not pulled from cold evidence. Route these to a different prompt.
- **Drafts a human SDR or AE will review before send.** The human is a stronger gate than the skill; running the skill in front of the human wastes tokens and adds latency without raising precision.
- **Drafts without a `prospect_evidence` pack.** Without the evidence the upstream model used, the skill cannot verify claims. It returns `insufficient_evidence` rather than guessing. Fix upstream — get the AI SDR to expose its retrieval context — not by loosening the rubric.

## Inputs

Required:

- `draft.subject` — string. The proposed subject line.
- `draft.body` — string. The proposed plain-text body. HTML drafts are rejected; convert upstream.
- `draft.from` — string. The literal `From:` line that will appear in the sent email.
- `sender_domain` — string. The sending domain (used for the deliverability rubric's identity check).
- `recipient.country` — ISO 3166-1 alpha-2 country code. Drives jurisdictional profile selection in the compliance rubric.
- `prospect_evidence` — object. The exact enrichment payload the upstream AI SDR used. Required shape: an array of `{ source, retrieved_at, claim_text, citation_url? }` entries. Every claim the AI SDR made in the draft must trace to an entry here.

Optional:

- `recipient.us_state` — ISO 3166-2 subdivision code. Required for the US profile when CCPA-aligned opt-out applies.
- `brand_guide` — string. Path to or inline contents of a brand voice file with banned phrasings beyond the defaults. Loaded alongside the deliverability rubric.
- `cache_key_prefix` — string. Optional prompt-cache prefix for batch runs; see the cache-key convention below.
- `request_rewrite` — boolean. Default `false`. When `true`, the skill returns a rewritten draft alongside the verdict on `edit` or `block`.

## Reference files

Load these from `references/` before first run. The four rubric files are stable across calls within a deployment — cache them.

- `references/1-claim-rubric.md` — what counts as a claim, the evidence-pack contract, per-axis pass/block thresholds. `claim_block_threshold` is set here.
- `references/2-personalization-rubric.md` — grounded vs ungrounded specifics, the 0-5 scoring scale with example outputs at each score. `personalization_block_below` is set here.
- `references/3-compliance-rubric.md` — per-jurisdiction profiles (US CAN-SPAM, RFC 8058 one-click unsubscribe, EU GDPR legitimate interest, NYC LL144 awareness, French Loi Hamon, California CCPA-aligned opt-out).
- `references/4-sample-output.md` — literal `send`, `edit`, and `block` outputs plus the structured-field contract for parsers.

## Method

Run these steps in order. Earlier steps gate later steps.

### 1. Input validation

Reject the call if any required field is missing or malformed. Return `result: insufficient_input` with the specific field name. Do not score on a partial record. A malformed `prospect_evidence` pack (missing the array, entries missing `source` or `claim_text`) is a hard rejection — the verifier cannot run without the contract.

### 2. Claim extraction and verification

Extract every factual claim about the prospect, the prospect's company, or a public event the draft references. Examples: "I saw your Series B announcement", "your hiring spike on the data team", "your podcast appearance with Lenny last month", "since you moved to [Company] in March".

For each claim:

- Match against the `prospect_evidence` pack. A claim is **grounded** if at least one entry in the pack supports it (same entity, consistent date, consistent fact).
- If no entry supports the claim, mark it **ungrounded**.
- A grounded claim with a stale `retrieved_at` (older than 90 days for company facts, older than 30 days for hiring or product-launch facts) is downgraded to **stale_grounded** and flagged as an edit-tier finding.

Apply the threshold from `references/1-claim-rubric.md`: `claim_block_threshold` ungrounded claims (default 1) trips a block.

### 3. Personalization scoring

Score the draft on the 0-5 scale defined in `references/2-personalization-rubric.md`:

- **Grounded specifics** — entities, events, or properties tied to a citation in the evidence pack. Each counts toward the score.
- **Ungrounded specifics** — references to "your industry", "your role", "your team", "your company" without a tied citation. These count zero.

Apply `personalization_block_below` (default 2). Drafts under the threshold are blocked.

The grounded/ungrounded separation is the guard against score gaming — if the rubric rewarded specificity alone, the upstream AI SDR would learn to stuff specific-looking tokens. A "Snowflake" mention without a current-employment citation reads as ungrounded.

### 4. Compliance scan

Read `recipient.country` (and `recipient.us_state` if present). Load the matching jurisdictional profile from `references/3-compliance-rubric.md`. If no profile matches, return `result: insufficient_compliance_context` — do not fall back to a generic profile.

For the matched profile, check every required element:

- US CAN-SPAM floor: physical sender address in the footer, visible unsubscribe link, sender identity matching the `From:` line.
- RFC 8058 (Google + Yahoo bulk-sender requirement since February 2024): the `List-Unsubscribe` header must include both `mailto:` and `https://` options, and the `List-Unsubscribe-Post: List-Unsubscribe=One-Click` header must be present. The skill cannot inspect headers directly; it requires the calling agent to pass `email_headers` or to confirm `headers_compliant: true`.
- EU GDPR profile: legitimate interest basis documented, opt-out language present, no third-country transfers without standard contractual clauses noted in the evidence pack.
- France Loi Hamon: B2B opt-out language present.
- California: CCPA-aligned "Do Not Sell or Share" link or its B2B equivalent.
- NYC LL144 awareness: if the draft references a hiring or recruiting action and the recipient is in NYC, flag for human review.

Missing any required element for the matched profile is a block.

### 5. Deliverability and voice scan

Run the bundled checks:

- Spam-trigger phrasings — "guaranteed", "free money", "act now", "click here now", "100% free", "no obligation", excessive currency symbols.
- Subject line over 70 characters or in all caps.
- Body under 40 words or over 250 words.
- Image-only body (no plain text content).
- More than 3 outbound links.
- Link-cloaking patterns (link text that does not match the destination domain).
- Stock AI tells — "I hope this email finds you well", "I wanted to reach out", "I came across your profile" (these read as AI-generated to trained recipients and lower reply rate).
- Banned phrasings from `brand_guide` if supplied.

A single flag triggers an `edit` verdict. Two or more flags stacked trigger a `block`.

### 6. Verdict assembly

Return one verdict:

- `send` — no blocks, no edit-tier flags. The draft is releasable.
- `edit` — one or more edit-tier flags. The draft is releasable after applying the suggested rewrites (returned inline when `request_rewrite: true`).
- `block` — one or more blocking issues. The draft must not send. The blocking axis is named; the suggested fix is included.

The output format is in `references/4-sample-output.md`.

## Output format

Literal JSON the skill emits for a `block` verdict:

```json
{
"verdict": "block",
"result": "ok",
"blocking_issues": [
{
"axis": "claim_accuracy",
"finding": "Ungrounded claim: 'I saw your Series B announcement last week'. No entry in prospect_evidence supports a recent Series B.",
"fix": "Remove the claim or attach a citation to prospect_evidence and re-run."
}
],
"edit_flags": [
{
"axis": "voice",
"finding": "Stock opener detected: 'I hope this email finds you well'",
"fix": "Replace with a grounded opener tied to a specific entry in prospect_evidence."
}
],
"personalization_score": 3,
"rewritten_draft": null,
"qa_metadata": {
"model": "claude-sonnet-4-6",
"input_tokens": 2840,
"output_tokens": 420,
"rubric_version": "1.0.0"
}
}
```

A `send` verdict has empty `blocking_issues` and empty `edit_flags`. An `edit` verdict has empty `blocking_issues` and a populated `edit_flags` (plus `rewritten_draft` when `request_rewrite: true`).

## Cache-key convention

The four rubric files are stable across calls within a deployment. To use Claude prompt caching:

- Cache prefix: the concatenation of `references/1-claim-rubric.md` + `references/2-personalization-rubric.md` + `references/3-compliance-rubric.md` + `references/4-sample-output.md` is the cacheable prefix. Mark it with `cache_control: { type: "ephemeral" }` in the Anthropic SDK call.
- The variable suffix is the draft, the prospect evidence pack, and the recipient context.
- Expected cost reduction at production volume: 30-50% on input tokens. At 50,000 calls per month and an average 2,500 input tokens, that is roughly $1,500/month in savings against Sonnet 4.x list pricing.

## Watch-outs

- **False blocks on legitimate AI-pulled specifics.** If the upstream AI SDR retrieved a recent press release the evidence pack does not include, the skill flags the claim as ungrounded. **Guard:** the skill verifies against the supplied evidence pack only, never against model knowledge. The contract is that the AI SDR includes everything it used to write the draft in the pack. The fix is upstream, not loosening the rubric.
- **Personalization score gaming.** A skill that rewards specificity teaches the upstream model to stuff specific-looking tokens. **Guard:** grounded and ungrounded specifics score separately. A named entity counts only if a citation in the pack supports it; a stale specific without a current-employment citation is ungrounded.
- **Compliance creep across jurisdictions.** Different rules per recipient. **Guard:** per-jurisdiction profiles; missing context returns `insufficient_compliance_context` rather than falling back to a generic profile.
- **The skill becomes the bottleneck.** At 50,000 sends per month and a 3-second p95 per draft, serial QA adds roughly 42 hours of wall-clock. **Guard:** parallelize per-draft (20-50 in flight), cache the rubrics, alert when p95 climbs above 5 seconds.
- **Hallucinated compliance.** The skill could claim a header is present when it is not. **Guard:** the skill requires the calling agent to pass `email_headers` or set `headers_compliant: true` — it does not infer header state from the body.

# Claim rubric — TEMPLATE

> Replace this file's contents with your team's calibrated thresholds.
> The ai-sdr-draft-qa skill reads this file before every run. A blank or
> default version is usable, but the defaults below are conservative and
> will likely over-block on a high-volume SMB deployment.

## What counts as a claim

A claim is any factual assertion the draft makes about the prospect, the prospect's company, or a public event referenced as context. Examples:

- "I saw your Series B announcement last Tuesday." → claim about a funding event.
- "Your team just hired three data engineers." → claim about a hiring event.
- "Since you moved to Snowflake in March." → claim about the prospect's current employment.
- "Your CEO mentioned the migration on the Lenny podcast." → claim about a public statement.

Not a claim (do not extract):

- Generic industry observation ("RevOps teams are spending more on signal tools").
- A question to the prospect ("Are you still running the manual scoring on weekly leads?") — this is a question, not an assertion.
- A statement about the sender ("We worked with three companies in your space last quarter").

## The evidence-pack contract

`prospect_evidence` is an array of entries shaped:

```json
{
"source": "linkedin_profile|crunchbase|company_blog|news_api|gong_call|crm_note|press_release",
"retrieved_at": "ISO 8601 timestamp",
"claim_text": "the literal evidence supporting the claim",
"citation_url": "https://... (optional but recommended)"
}
```

The upstream AI SDR is responsible for emitting this pack alongside the draft. If a claim in the draft cannot be matched to any entry, the claim is ungrounded.

## Matching rules

A claim is **grounded** if at least one evidence entry meets all three:

1. **Entity match.** Same person, company, product, or event named in the claim and the evidence.
2. **Fact match.** Consistent fact (a "Series B" claim matched against a Series B entry, not a Series A entry).
3. **Freshness.** `retrieved_at` is within the per-fact-type freshness window:
- Company-level facts (HQ, employee band, public funding stage) — 90 days.
- Hiring or product-launch facts — 30 days.
- Prospect employment or role — 60 days.

A grounded claim outside the freshness window is downgraded to `stale_grounded` and surfaced as an edit-tier finding (suggested fix: refresh the evidence pack and re-run, or remove the time-sensitive specific).

## Thresholds

```yaml
claim_block_threshold: 1 # number of ungrounded claims that trips a block verdict
stale_grounded_block_threshold: 3 # number of stale_grounded findings that escalate from edit to block
```

The conservative default of 1 ungrounded claim → block surfaces every hallucinated claim. Raise to 2 only if you are tolerant of some hallucinated rate in exchange for fewer blocks (high-volume SMB deployments selling at low ACV may justify this).

## What the skill does NOT do

The claim rubric is a verifier, not a fact-checker. It does not call out to the live web, hit news APIs, or query LinkedIn. It only verifies the draft against the supplied evidence pack. If the upstream AI SDR's enrichment was wrong (the pack itself contains a hallucinated Series B), the skill will treat the claim as grounded. The fix lives upstream — pick an enrichment vendor whose retrieval the skill can trust.

## Last edited

{YYYY-MM-DD} — by {RevOps team member name}

# Personalization rubric — TEMPLATE

> Replace this file's contents with your team's calibrated rubric.
> The defaults work as a starting point but the score-to-block threshold
> matters more than the rubric itself.

## The two-pole scoring rule

Personalization is scored on a 0-5 scale. The scale separates **grounded specifics** from **ungrounded specifics** so the upstream AI SDR cannot game the score by stuffing tokens.

- **Grounded specific** — a named entity, event, or property tied to a citation in `prospect_evidence`. Examples: a podcast episode the prospect appeared on, a tool the prospect's team adopted, a specific job posting on the prospect's careers page, a thread the prospect wrote on LinkedIn last week.
- **Ungrounded specific** — a reference to "your industry", "your role", "your team", "your company" without a tied citation. Also: stale references to a prior employer presented as current ("your work at Snowflake" when the prospect moved 18 months ago and no current-employment citation is present).

Only grounded specifics count toward the score. Ungrounded specifics count zero — they read as personalized to a casual reader but add no real signal.

## Score scale

| Score | Description | Example draft excerpt |
|---|---|---|
| 0 | No specifics, only template placeholders. | "Hi {first_name}, I help companies like yours scale outbound." |
| 1 | One ungrounded specific only. | "Hi Maria, I noticed Acme is in the fintech space." |
| 2 | One grounded specific. | "Hi Maria, I read your post on outbound attribution from last Tuesday." |
| 3 | Two grounded specifics. | "Hi Maria, your post on outbound attribution last Tuesday plus the SDR job posting on Acme's careers page suggest you're scaling the team." |
| 4 | Two grounded specifics + one used as the connective tissue of the ask. | "Hi Maria — the SDR job posting on Acme's careers page reads like the same gap your attribution post described. Worth a 15-min walkthrough of how Northwind solved this?" |
| 5 | Three or more grounded specifics, tied together into a single coherent ask, with the ask landing on the prospect's named priority. | (See sample-output.md for a literal example.) |

## Threshold

```yaml
personalization_block_below: 2
```

Drafts that score 0 or 1 are blocked. A score of 2 (one grounded specific) is the floor for a releasable cold draft. Below that, the draft reads as a template — generic openers, ungrounded "your industry" references, no concrete tie to the prospect.

## When to raise the threshold

Raise `personalization_block_below` to 3 for:

- Enterprise outbound where ACV > $50K and deal velocity is slow.
- Re-engagement of warm-but-quiet prospects (the second-touch context is already there; a single grounded specific reads thin).
- Outbound to known personas with high inbox volume (CTOs, CFOs) where reply rates depend on visibly higher effort.

Keep at 2 for high-volume SMB outbound where the volume math justifies some thinner drafts.

## Score-gaming patterns to refuse

The upstream AI SDR will try to inflate the score. Watch for:

- **Stale specifics presented as current.** "Your work at Snowflake" when the prospect moved. **Rule:** an employment-specific is grounded only if a current-employment citation is present in the pack.
- **Public-figure-style references that anyone could write.** "Your work in the SaaS space" with the prospect's company swapped in. **Rule:** the specific must be unique to this prospect, not a generic fact about their industry.
- **Citation-shaped phrasings without a real citation.** "Per your LinkedIn post on Wednesday" with no Wednesday LinkedIn post in the evidence pack. **Rule:** every citation-shaped phrasing must match an entry in the pack.

## Last edited

{YYYY-MM-DD} — by {RevOps team member name}

# Compliance rubric — TEMPLATE

> Replace this file's contents with profiles tuned to your sending footprint.
> The defaults below cover the common jurisdictions for B2B outbound in 2026.
> Confirm with legal before relying on them for production sends.
>
> The ai-sdr-draft-qa skill reads `recipient.country` from the input and
> applies the matching profile. If no profile matches, the skill returns
> `result: insufficient_compliance_context`. The skill does not fall back
> to a generic profile silently — that is a banned behavior in this rubric.

## Required elements (US floor — CAN-SPAM)

Applied to all US recipients regardless of state. Every send must include:

| Element | Where it lives | What the skill checks |
|---|---|---|
| Visible unsubscribe link | Email body footer | A clickable URL whose link text contains "unsubscribe" or equivalent. |
| Physical sender address | Email body footer | A street address line in the footer block. |
| Truthful sender identity | The `From:` line | `draft.from` must match `sender_domain` (no spoofing). |
| Subject line not deceptive | The `draft.subject` field | No subject line that promises a relationship that does not exist ("Re: your reply", "Per our call yesterday") unless those events actually occurred. |

## RFC 8058 — one-click unsubscribe (Google + Yahoo bulk-sender requirement)

Effective February 2024 for any sender exceeding 5,000 messages per day to Gmail or Yahoo addresses. The skill cannot inspect raw email headers; it requires the calling agent to pass either `email_headers` (the literal header block) or set `headers_compliant: true` after the agent's own verification.

Required headers:

- `List-Unsubscribe: <mailto:unsubscribe@yourdomain.com>, <https://yourdomain.com/unsub?id=XYZ>`
- `List-Unsubscribe-Post: List-Unsubscribe=One-Click`

Missing either is a block when sending to Gmail or Yahoo. The skill checks the recipient TLD/domain to determine applicability — if the recipient is on Google Workspace or Yahoo Mail, the requirement applies.

## EU GDPR profile

Applied when `recipient.country` is in the EU/EEA. Required elements:

- **Legitimate interest basis documented in the evidence pack.** A `legitimate_interest_basis` field in any `prospect_evidence` entry, with a non-empty string explaining the basis (e.g., "B2B contact from publicly listed business email, role-aligned to product use case").
- **Visible opt-out language in the body.** Not just an unsubscribe link — an explicit sentence the prospect can read inline: "Reply STOP or click below to opt out of future emails."
- **No personal-data claims beyond what the legitimate interest basis covers.** Hiring intent inferred from "your company is hiring" without a published job posting in the pack is a block — the inference is personal data processing without basis.

Missing any required element → block.

## France Loi Hamon (B2B addition to GDPR)

Applied when `recipient.country` is France. On top of the EU profile:

- Explicit B2B opt-out language stating the recipient can refuse further commercial solicitation.

## California profile (US + state-specific)

Applied when `recipient.us_state` is `US-CA`. On top of the US floor:

- A CCPA-aligned opt-out reference. For B2B, this is the "Do Not Sell or Share My Personal Information" link, the equivalent under CPRA, or an explicit B2B opt-out sentence.

## NYC LL144 awareness (hiring-adjacent outreach only)

Applied when `recipient.us_state` is `US-NY` AND the draft references a hiring, sourcing, or recruiting action by the sender. NYC LL144 governs Automated Employment Decision Tools used in hiring decisions; outbound that references the sender's hiring workflow needs human review for LL144 alignment.

The skill does not block — it flags `human_review: ll144_hiring_outreach` and routes the draft to a reviewer queue. This is a routing decision, not a compliance verdict.

## Profile selection logic

```
function selectProfile(recipient):
  if recipient.country in EU_EEA:
    profile = "eu_gdpr"
    if recipient.country == "FR":
      profile += "+france_loi_hamon"
  elif recipient.country == "US":
    profile = "us_can_spam + rfc_8058"
    if recipient.us_state == "US-CA":
      profile += "+california_ccpa"
    if recipient.us_state == "US-NY" and draft_mentions_hiring:
      profile += "+nyc_ll144_awareness"
  elif recipient.country == "CA":
    profile = "canada_casl"     # not detailed here; CASL has its own consent rules
  elif recipient.country in ["GB", "CH", "NO"]:
    profile = "eu_gdpr_equivalent"
  else:
    return insufficient_compliance_context
```

## Profiles not covered by defaults

Brazil LGPD, India DPDP, Australia Spam Act, Singapore PDPA, Japan APPI — add these as separate profiles if your sending footprint covers those countries. Each needs its own required-elements table. Do not collapse them into a "global" fallback; the variance between regimes is too large.

## Last edited

{YYYY-MM-DD} — by {legal-ops team member name}

# Sample output — for parser wiring and integration tests

> Literal examples of the three verdicts the skill emits. Use these
> when wiring the pre-send webhook return-path, the parser that pushes
> the verdict back into 11x / Artisan / aisdr / Unify, or the integration
> tests that exercise the QA gate.

## verdict: send

A clean draft. No blocking issues, no edit flags. The calling agent releases the send.

```json
{
  "verdict": "send",
  "result": "ok",
  "blocking_issues": [],
  "edit_flags": [],
  "personalization_score": 3,
  "claim_findings": {
    "grounded": 2,
    "ungrounded": 0,
    "stale_grounded": 0
  },
  "compliance_profile_applied": "us_can_spam + rfc_8058",
  "rewritten_draft": null,
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2410,
    "output_tokens": 280,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:42:11Z"
  }
}
```

## verdict: edit

Releasable after the edit flags are applied. The calling agent either applies the suggested fixes automatically (when `request_rewrite: true` returns a `rewritten_draft`) or routes to a reviewer to apply by hand.

```json
{
  "verdict": "edit",
  "result": "ok",
  "blocking_issues": [],
  "edit_flags": [
    {
      "axis": "voice",
      "finding": "Stock AI opener: 'I hope this email finds you well'",
      "fix": "Replace with a grounded opener tied to a specific entry in prospect_evidence (e.g., a recent LinkedIn post by the prospect)."
    },
    {
      "axis": "deliverability",
      "finding": "Subject line is 78 characters (threshold: 70).",
      "fix": "Trim to under 70 characters. Suggested: 'Acme's hiring spike — quick question on attribution'"
    }
  ],
  "personalization_score": 2,
  "claim_findings": {
    "grounded": 1,
    "ungrounded": 0,
    "stale_grounded": 0
  },
  "compliance_profile_applied": "us_can_spam + rfc_8058",
  "rewritten_draft": {
    "subject": "Acme's hiring spike — quick question on attribution",
    "body": "Hi Maria — your post on outbound attribution last Tuesday lined up with the SDR job posting on Acme's careers page. Worth a 15-min walkthrough of how Northwind solved the same gap?\n\nReply STOP to opt out.\n\nOoligo, Inc. · 100 Market St, San Francisco, CA 94105"
  },
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2620,
    "output_tokens": 540,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:43:02Z"
  }
}
```

## verdict: block

Not releasable. The blocking axis is named; the calling agent must regenerate, route to a human, or hard-fail the send.

```json
{
  "verdict": "block",
  "result": "ok",
  "blocking_issues": [
    {
      "axis": "claim_accuracy",
      "finding": "Ungrounded claim: 'I saw your Series B announcement last week'. No entry in prospect_evidence supports a recent Series B.",
      "fix": "Remove the claim, or attach a Series B citation to prospect_evidence and re-run."
    },
    {
      "axis": "personalization",
      "finding": "Score 1 — single ungrounded specific ('your industry') only. Threshold for releasable: 2.",
      "fix": "Add at least one grounded specific tied to a citation in prospect_evidence."
    }
  ],
  "edit_flags": [
    {
      "axis": "voice",
      "finding": "Stock AI opener: 'I wanted to reach out'",
      "fix": "Replace with a grounded opener."
    }
  ],
  "personalization_score": 1,
  "claim_findings": {
    "grounded": 0,
    "ungrounded": 2,
    "stale_grounded": 0
  },
  "compliance_profile_applied": "us_can_spam + rfc_8058",
  "rewritten_draft": null,
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2480,
    "output_tokens": 460,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:44:18Z"
  }
}
```

## result: insufficient_input

Returned when a required input field is missing. The skill does not score; the calling agent must fix the call.

```json
{
  "verdict": null,
  "result": "insufficient_input",
  "missing_field": "prospect_evidence",
  "message": "prospect_evidence pack is required. The skill cannot verify claims against general model knowledge.",
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 320,
    "output_tokens": 80,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:45:00Z"
  }
}
```

## result: insufficient_compliance_context

Returned when `recipient.country` (or required state) maps to no jurisdictional profile. The skill refuses to score rather than falling back to a generic profile.

```json
{
  "verdict": null,
  "result": "insufficient_compliance_context",
  "missing_field": "recipient.country profile",
  "message": "No jurisdictional profile matched recipient.country='SG'. Add a Singapore PDPA profile to references/3-compliance-rubric.md.",
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2380,
    "output_tokens": 110,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:45:42Z"
  }
}
```

## Field contract for parsers

If the calling agent consumes the JSON directly:

- `verdict` — enum: `send` / `edit` / `block` / `null` (null when `result` is non-ok).
- `result` — enum: `ok` / `insufficient_input` / `insufficient_compliance_context` / `insufficient_evidence`.
- `blocking_issues[]` — array of `{ axis, finding, fix }`. Axes: `claim_accuracy`, `personalization`, `compliance`, `deliverability`.
- `edit_flags[]` — same shape. Axes: `voice`, `deliverability`, `claim_accuracy` (for stale_grounded).
- `personalization_score` — integer 0-5.
- `claim_findings` — object: `{ grounded, ungrounded, stale_grounded }` counts.
- `compliance_profile_applied` — string identifying the matched profile.
- `rewritten_draft` — object `{ subject, body }` or null. Populated only when `request_rewrite: true`.
- `qa_metadata` — `{ model, input_tokens, output_tokens, rubric_version, ran_at }` for cost accounting and audit.