claude-skill

Catch hallucinated claims, generic personalization, and compliance breaks in AI SDR drafts before they send

Difficulty

中級

Setup time

60-90 min

For

revops · sdr-leader · gtm-engineer

RevOps

Stack

AI SDR (11x の Alice、Artisan の Ava、aisdr や Unify の内部エージェント) と送信アクションのあいだに入り、各ドラフトを 4 つのルーブリック (主張の正確性、パーソナライズの根拠、管轄ごとのコンプライアンス、配信到達性の衛生) で採点し、失敗した軸を明示した block / edit / send の判定を返す Claude Skill です。バンドル apps/web/public/artifacts/ai-sdr-draft-qa-skill/ には、SKILL.md、references/ 配下の 4 つのルーブリックファイル、パーサー配線用のリテラルサンプル出力ファイルが含まれます。

いつ使うか

メッセージごとの人手レビューを介さずに送信する AI SDR のデプロイすべてで、送信前のゲートとしてこの Skill を実行します。本番運用の 2 つのパターンは、AI SDR の送信アクションの手前に置く Webhook がドラフトとプロスペクトのエビデンスパックを Skill へ POST し、verdict: send のレスポンスを受け取ったときだけ送信を解放するもの、もしくは次の 24 時間分のキュー内ドラフトに対する送信前バッチパスで、verdict: block が出たシーケンスステップをすべて一時停止するものです。

この Skill はパイロット期のキャリブレーションツールとしても機能します。11x、Artisan、aisdr 導入の最初の 1 か月分から 500 件のドラフトを Skill に通し、同じ 500 件を RevOps アナリストが手作業でラベリングします。不一致集合を見れば、AI SDR があなたの ICP に対して過剰または過小にパーソナライズしているか、ハルシネーション主張の発生がどこに集中しているか、そして送信量を週 5,000 件以上にスケールする前に管轄プロファイルの調整が必要かどうかが分かります。

この Skill には、ドラフトに加えて prospect_evidence パックが必要です。これは AI SDR がそのドラフトの作成に用いたのと同じエンリッチメントペイロードです。上流の AI SDR がエビデンスパックを開示しない場合 (一部のクローズドスイートは隠します)、Skill は主張を検証できず、推測ではなく insufficient_evidence を返します。これはバグではなく仕様です。モデルの一般知識に対してドラフトを採点する QA ゲートは、自分自身の検証をハルシネーションさせるからです。

使わないとき

人間の SDR や AE が送信前に各ドラフトをレビューしている場合は、この Skill を使わないでください。レビュアーは Skill より強いゲートです。Skill にはないビジネス文脈を持っており、人間のレビュアーの前にモデルを置くとトークンを浪費しレイテンシだけが増え、Precision は上がりません。完全自律もしくは部分自律のフロー用です。

唯一の配信到達性コントロールとしては使わないでください。Skill はドラフト内のスパムトリガー表現、大文字の件名、画像のみの本文、リンククローキングのパターンをスキャンします。ドメイン全体の DMARC、苦情率、ブロックリスト状態は監視しません。それは email-deliverability-monitor-n8n フローの仕事です。両方を併走させてください。

ウォームリプライのドラフトや、すでに会話中のスレッドには使わないでください。ルーブリックはコールドアウトバウンド用に設計されています。すでにミーティングを予約したプロスペクトへの返信ドラフトは、仕様上、パーソナライズルーブリックで失敗します (この段階のパーソナライズは文脈を踏まえたものであるべきで、コールドエビデンスから引いたものではありません)。ウォームティアのドラフトは別のプロンプトへルーティングしてください。

セットアップ

Skill 本体のセットアップは 60-90 分です。これに加えて上流の配線時間がかかります。AI SDR が送信前 Webhook を公開しているかどうかで時間は変動します。

Skill をインストールする。 apps/web/public/artifacts/ai-sdr-draft-qa-skill/SKILL.md と references/ フォルダを .claude/skills/ai-sdr-draft-qa/ ディレクトリに配置するか、claude.ai に Skill としてアップロードします。frontmatter の name と description が、呼び出し側エージェントから Skill を起動するキーになります。
主張ルーブリックをキャリブレーションする。 references/1-claim-rubric.md を開き、claim_block_threshold を設定します。これは block 判定を発火させる未検証主張の数 (デフォルト 1) です。多くの AI SDR は資金調達ラウンドや人員規模の捏造を頻発させます。デフォルトの 1 はハルシネーション主張を 1 件単位で表に出します。2 に上げるのは、ブロックを減らす代わりにハルシネーションリスクを許容する場合に限ります。
パーソナライズルーブリックをキャリブレーションする。 references/2-personalization-rubric.md を開きます。デフォルトのスコアリングは 0-5 のスケールで、デフォルトの personalization_block_below は 2 です。スコア 2 はエビデンスパックに紐付いた根拠ある具体性が少なくとも 1 つあることを意味します。0 や 1 を取るドラフトは「Hi [first_name], I noticed [Company] is in the [industry] space」型のテンプレートです。ブロックしてください。
管轄プロファイルを選択する。 references/3-compliance-rubric.md を開き、送信実態に合うプロファイルを有効化します。US CAN-SPAM + RFC 8058 のワンクリック解除は床です。EU GDPR の正当な利益の根拠文書は EU 受信者向けのレイヤーです。フランスは B2B 向けに Loi Hamon を追加します。カリフォルニアは CCPA 整合の opt-out を追加します。コンプライアンスルーブリックはエビデンスパックからプロスペクトの国を読み取り、合致するプロファイルを適用するか、insufficient_compliance_context を返します。
送信前 Webhook を配線する。 11x と Artisan はプラットフォーム設定で送信前 Webhook をエンドポイント URL に向けます (もしくはプラットフォームの「承認キュー」モードを使って Skill が承認を駆動します)。Unify と aisdr はプラットフォームのオープン API でキュー上の次のドラフトを取得し、Skill を呼び、判定を書き戻します。自前のエージェントの場合は、SMTP 送信呼び出しの直前に Skill を置きます。
ブロックポリシーを決める。 block 判定は、ドラフトを人間のレビュアーへルーティングするか、AI SDR に再生成させるために保留するか、送信をハードフェイルさせるかのいずれかです。デフォルトは「失敗した軸をフィードバックとして添えて再生成のために保留」です。多くの AI SDR は具体的な失敗を渡すと 2 回目のパスでドラフトを改善します。

Skill の実際の動作

ステップ 1 — 入力検証。 Skill は、ドラフトの本文、件名、送信ドメイン、受信者の国、prospect_evidence パックが欠けている呼び出しを拒否します。いずれかが欠けると、該当フィールドを示して insufficient_input を返します。不完全なレコードに対するスコアリングは走りません。

ステップ 2 — 主張の抽出と検証。 プロスペクト、プロスペクトの会社、参照されている公開イベントに関するすべての事実的主張 (「先週シリーズ B を発表したのを見ました」「データチームの採用急増」など) を抽出し、エビデンスパックと照合します。主張は、パック内の引用が裏付けるとき 根拠付き です。根拠のない主張はフラグが立ちます。デフォルト claim_block_threshold: 1 で、根拠のない主張が 1 つでも出ればブロックが発火します。

ステップ 3 — パーソナライズスコアリング。 Skill は根拠ある具体性を 0-5 で採点します。根拠ある具体性 とは、エビデンスパック内の引用に紐付いた詳細で、プロスペクトが使っている特定のツール名、彼らが公開した特定の求人、出演したポッドキャストなどです。根拠のない具体性 (「あなたの業界」「あなたの役職」「あなたのチーム」) はカウントされません。personalization_block_below: 2 を下回るドラフトはブロックされます。2 極の分離 (根拠ありか根拠なしか) が、AI SDR がトークン詰め込みでスコアをゲームすることを防ぐガードです。

ステップ 4 — コンプライアンススキャン。 Skill は次を確認します。List-Unsubscribe ヘッダパターンと、RFC 8058 (2024 年 2 月以降の Google・Yahoo バルクセンダー要件) に従う List-Unsubscribe-Post: List-Unsubscribe=One-Click 行、CAN-SPAM に従うフッターの物理的送信者住所、見える本文内の解除リンク、From 行と一致する送信者アイデンティティ、そして有効化された各プロファイルの管轄追加要件。必須要素のいずれかが欠ければブロックです。

ステップ 5 — 配信到達性とボイスのスキャン。 Skill は次をマークします。スパムトリガー表現 (「guaranteed」「free money」「act now」)、70 文字を超える件名または大文字の件名、40 語未満または 250 語超の本文、画像のみの本文、3 リンク超、ストック的な AI らしさのテル (「I hope this email finds you well」「I wanted to reach out」)。1 件のマークは edit 判定を発火させ、ブロックではありません。別のマークと積み重なった場合はブロックです。

ステップ 6 — 判定の組み立て。 Skill は 3 つのうち 1 つの判定を返します。send (ブロックなし、編集なし)、edit (1 件以上の edit ティアフラグと、提案された書き換えをインラインで)、block (1 件以上のブロック要因と、失敗した軸を明示)。出力フォーマットは references/4-sample-output.md にあります。

コストの実態

QA 1 パスで入力トークン 1,500-3,500 (ドラフト、エビデンスパック、未キャッシュ時の 4 つのルーブリックファイル)、出力トークン 400-800 を消費します。Claude Sonnet 4.x の価格 (2026 年中頃の参考価格でおおむね $3 / 100 万入力、$15 / 100 万出力) では、1 パスあたり $0.01-0.03 です。

AI SDR 規模 (自律エージェント 1 体で月 5,000-15,000 送信) では、QA レイヤーは月額 $50-450 です。月 50,000 送信のデプロイ (複数エージェント、マルチドメイン送信) では $500-1,500 です。代替案と比較してください。0.3% の苦情率スパイクで送信ドメインが 1 つ抑制されると、おおむね営業日換算で 5-10 日分のパイプラインを失います。QA コストはひどい 1 週間に対する丸め誤差です。

ルーブリックファイルのプロンプトキャッシュにより、本番ボリュームでは入力トークンコストが 30-50% 削減されます。バンドルの SKILL.md にキャッシュキー規約が記載されています。4 つのルーブリックファイルは、ある 1 つのデプロイ内の呼び出し全体で安定です。

成功指標

追跡すべき指標は ハルシネーション主張の捕捉率 です。週 100 件のドラフトをサンプリングし、RevOps アナリストが各々を根拠のない主張についてラベリングし、Skill のリコールをアナリストのラベルに対して測定します。リコール 95% 超ならルーブリックは機能しています。90% 未満なら主張ルーブリックを引き締める必要があります (しきい値を下げるか、「主張」とみなす範囲を広げる)。

副次指標は 誤ブロック率 です。Skill がブロックしたドラフトのうち、アナリストなら承認したであろう割合を数えます。誤ブロック率 8% 超は、パーソナライズしきい値を 2 から 1 に緩めるか、根拠ある具体性の定義を広げるシグナルです。3% 未満は Skill のブロック不足を意味します。しきい値を逆方向に押します。

2 つの指標は反対方向に動きます。あなたの許容度に合う運用点を選んでください。Fortune 500 へ売る B2B エンタープライズチームは厳しめに運用すべきです。リコール高め、誤ブロック高めを許容します。週 10,000 件以上を捌く高ボリュームの SMB チームは緩めに運用すべきです。誤ブロック低め、ボリューム計算が成り立つならハルシネーション主張をある程度許容します。

代替案との比較

vs QA なし。 2026 年時点で完全自律な AI SDR デプロイの現状は、ベンダー側の軽いガードレールを超える送信前ゲートが無いことです。自律送信のリプライレートは 1-3%、ハイブリッド (AI + 人間) ポッドでは 8-15% です (2026 年中頃までのバイヤー報告デプロイからの推定であり、単一の公表ベンチマークではありません)。ハルシネーション主張と汎用パーソナライズのパターンは、このギャップの実質的な比率を占めます。QA ゲートを足すとレートは上がりますが、上げ幅には上限があります。ドラフトの改善はコールドリストをウォームに変えません。

vs AI SDR のビルトインガードレール。 11x と Artisan は内部品質チェックを同梱しており、明白な失敗をフラグします。ただし失敗面は不透明で、何がチェックされたか / 漏れたかを検査できず、しきい値を調整することもできません。この Skill はルーブリックを検査可能にします。トレードオフ: 独立したモデル呼び出しなので、独自のレイテンシコストが乗ります。

vs 人間の SDR レビュアー。 人間のレビュアーは、Skill が取り逃すビジネス文脈の失敗 (「このプロスペクトは大規模障害が起きたばかりなので、明るい調子のメールは送らない」) を拾います。Skill は、人間のレビュアーがその日 200 件目で取り逃す一貫性の失敗を拾います。ディールバリューが高ければ両方を走らせ、ボリュームが高ければ Skill のみを走らせます。

vs 上流の AI SDR を縛る構造化プロンプト。 上流プロンプトを厳しくすれば、ソースでのハルシネーションは減ります。それでも残るレートは取り逃しますし、管轄ごとのコンプライアンス違反はフラグできません (管轄は受信者依存であり、ライティングプロンプトは受信者を知りません)。両方を使ってください。AI SDR には構造化された上流プロンプト、その上でこの Skill をゲートとして配置します。

注意点

AI が正当に引いた具体性に対する誤ブロック。 上流の AI SDR が、エビデンスパックに含まれない最近のプレスリリースを引いてきた場合、Skill はその主張を根拠なしとマークしてブロックします。ガード: Skill は提供されたエビデンスパックに対してのみ検証し、モデル知識に対しては検証しません。AI SDR がドラフト作成に使ったものはすべてパックに含める、というのが契約です。AI SDR がそれを果たせない場合、Skill は検証できません。修正は上流側です。AI SDR ベンダーに取得コンテキストを開示させることであって、ルーブリックを緩めることではありません。
パーソナライズスコアのゲーミング。 具体性を報酬する Skill は、上流モデルに具体的に見えるトークンを詰め込むことを学習させます。「Snowflake でのデータプラットフォームに関するあなたの仕事」は、プロスペクトが 18 か月前に Snowflake を離れていてもパーソナライズされたように読めます。ガード: ルーブリックは根拠ありと根拠なしの具体性を別々に採点します。エビデンスパック内の引用が裏付ける場合に限り、固有名がカウントされます。現職の引用を伴わない古い具体性は根拠なしとして読まれます。
管轄をまたぐコンプライアンスのクリープ。 CAN-SPAM、RFC 8058、GDPR、フランスの Loi Hamon、カリフォルニアの CCPA 整合 opt-out、採用関連アウトリーチに対する NYC LL144 の認識 — 受信者ごとにルールが異なります。ガード: コンプライアンスルーブリックは管轄ごとです。prospect_evidence パックは受信者の国 (関連する場合は米国の州) を含める必要があり、Skill は合致するプロファイルを適用するか insufficient_compliance_context を返します。汎用の「グローバル」プロファイルへ黙ってフォールバックすることは、ルーブリック上で禁止されています。
Skill がボトルネックになる。 月 50,000 送信、ドラフトあたり p95 3 秒では、QA ゲートは月あたりおよそ 42 時間の壁時計シリアル処理を追加します。並列なら問題なく、シングルスレッドなら不可です。ガード: バンドルは並列化パターン (ドラフトあたり 1 つの Claude 呼び出し、20-50 件をインフライト) と、4 つのルーブリックファイルに対するキャッシュキー規約を文書化しています。ドラフトあたり p95 3 秒未満を狙い、p95 が 5 秒を超えたらアラートを出してください。

参照バンドル

apps/web/public/artifacts/ai-sdr-draft-qa-skill/SKILL.md — Skill の完全な定義、入力、メソッド、出力フォーマット、注意点。
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/1-claim-rubric.md — 何を主張とみなすか、エビデンスパック契約、軸ごとの pass / block しきい値。
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/2-personalization-rubric.md — 根拠あり vs 根拠なしの具体性、各スコアの例示出力を伴う 0-5 採点。
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/3-compliance-rubric.md — 管轄プロファイル (US CAN-SPAM、RFC 8058 ワンクリック解除、EU GDPR 正当な利益、NYC LL144 認識、フランス Loi Hamon、カリフォルニア CCPA 整合 opt-out)。
apps/web/public/artifacts/ai-sdr-draft-qa-skill/references/4-sample-output.md — リテラルな send / edit / block 出力と、パーサー向け構造化フィールド契約。

GitHubでこのページを編集

Files in this artifact

Download all (.zip)

---
name: ai-sdr-draft-qa
description: Pre-send QA gate for AI SDR drafts (11x Alice, Artisan Ava, aisdr, Unify, homegrown agents). Scores each draft on claim accuracy, personalization grounding, jurisdictional compliance, and deliverability hygiene, then returns a block / edit / send verdict with the specific failing axis cited and an optional rewritten draft. Use as a webhook in front of the AI SDR's send action — not as a substitute for a human reviewer on warm or already-engaged threads.
---

# AI SDR draft QA

## When to invoke

Invoke before any AI-SDR-generated outbound email is released to the send queue. Production patterns:

- A pre-send webhook in 11x, Artisan, aisdr, or Unify that posts `{ draft, prospect_evidence, sender_domain }` to this skill and only releases the send on `verdict: send`.
- A batch pre-send pass over the next 24 hours of queued drafts that pauses any sequence step with `verdict: block`.
- A calibration pass during AI SDR pilot — run 500 drafts through the skill, have a RevOps analyst label the same 500 by hand, use the disagreement set to tune the rubric thresholds before scaling.

Do NOT invoke this skill for:

- **Warm or already-engaged threads.** Replies to a prospect who already booked a meeting will fail the personalization rubric by design — the personalization should be context-aware, not pulled from cold evidence. Route these to a different prompt.
- **Drafts a human SDR or AE will review before send.** The human is a stronger gate than the skill; running the skill in front of the human wastes tokens and adds latency without raising precision.
- **Drafts without a `prospect_evidence` pack.** Without the evidence the upstream model used, the skill cannot verify claims. It returns `insufficient_evidence` rather than guessing. Fix upstream — get the AI SDR to expose its retrieval context — not by loosening the rubric.

## Inputs

Required:

- `draft.subject` — string. The proposed subject line.
- `draft.body` — string. The proposed plain-text body. HTML drafts are rejected; convert upstream.
- `draft.from` — string. The literal `From:` line that will appear in the sent email.
- `sender_domain` — string. The sending domain (used for the deliverability rubric's identity check).
- `recipient.country` — ISO 3166-1 alpha-2 country code. Drives jurisdictional profile selection in the compliance rubric.
- `prospect_evidence` — object. The exact enrichment payload the upstream AI SDR used. Required shape: an array of `{ source, retrieved_at, claim_text, citation_url? }` entries. Every claim the AI SDR made in the draft must trace to an entry here.

Optional:

- `recipient.us_state` — ISO 3166-2 subdivision code. Required for the US profile when CCPA-aligned opt-out applies.
- `brand_guide` — string. Path to or inline contents of a brand voice file with banned phrasings beyond the defaults. Loaded alongside the deliverability rubric.
- `cache_key_prefix` — string. Optional prompt-cache prefix for batch runs; see the cache-key convention below.
- `request_rewrite` — boolean. Default `false`. When `true`, the skill returns a rewritten draft alongside the verdict on `edit` or `block`.

## Reference files

Load these from `references/` before first run. The four rubric files are stable across calls within a deployment — cache them.

- `references/1-claim-rubric.md` — what counts as a claim, the evidence-pack contract, per-axis pass/block thresholds. `claim_block_threshold` is set here.
- `references/2-personalization-rubric.md` — grounded vs ungrounded specifics, the 0-5 scoring scale with example outputs at each score. `personalization_block_below` is set here.
- `references/3-compliance-rubric.md` — per-jurisdiction profiles (US CAN-SPAM, RFC 8058 one-click unsubscribe, EU GDPR legitimate interest, NYC LL144 awareness, French Loi Hamon, California CCPA-aligned opt-out).
- `references/4-sample-output.md` — literal `send`, `edit`, and `block` outputs plus the structured-field contract for parsers.

## Method

Run these steps in order. Earlier steps gate later steps.

### 1. Input validation

Reject the call if any required field is missing or malformed. Return `result: insufficient_input` with the specific field name. Do not score on a partial record. A malformed `prospect_evidence` pack (missing the array, entries missing `source` or `claim_text`) is a hard rejection — the verifier cannot run without the contract.

### 2. Claim extraction and verification

Extract every factual claim about the prospect, the prospect's company, or a public event the draft references. Examples: "I saw your Series B announcement", "your hiring spike on the data team", "your podcast appearance with Lenny last month", "since you moved to [Company] in March".

For each claim:

- Match against the `prospect_evidence` pack. A claim is **grounded** if at least one entry in the pack supports it (same entity, consistent date, consistent fact).
- If no entry supports the claim, mark it **ungrounded**.
- A grounded claim with a stale `retrieved_at` (older than 90 days for company facts, older than 30 days for hiring or product-launch facts) is downgraded to **stale_grounded** and flagged as an edit-tier finding.

Apply the threshold from `references/1-claim-rubric.md`: `claim_block_threshold` ungrounded claims (default 1) trips a block.

### 3. Personalization scoring

Score the draft on the 0-5 scale defined in `references/2-personalization-rubric.md`:

- **Grounded specifics** — entities, events, or properties tied to a citation in the evidence pack. Each counts toward the score.
- **Ungrounded specifics** — references to "your industry", "your role", "your team", "your company" without a tied citation. These count zero.

Apply `personalization_block_below` (default 2). Drafts under the threshold are blocked.

The grounded/ungrounded separation is the guard against score gaming — if the rubric rewarded specificity alone, the upstream AI SDR would learn to stuff specific-looking tokens. A "Snowflake" mention without a current-employment citation reads as ungrounded.

### 4. Compliance scan

Read `recipient.country` (and `recipient.us_state` if present). Load the matching jurisdictional profile from `references/3-compliance-rubric.md`. If no profile matches, return `result: insufficient_compliance_context` — do not fall back to a generic profile.

For the matched profile, check every required element:

- US CAN-SPAM floor: physical sender address in the footer, visible unsubscribe link, sender identity matching the `From:` line.
- RFC 8058 (Google + Yahoo bulk-sender requirement since February 2024): the `List-Unsubscribe` header must include both `mailto:` and `https://` options, and the `List-Unsubscribe-Post: List-Unsubscribe=One-Click` header must be present. The skill cannot inspect headers directly; it requires the calling agent to pass `email_headers` or to confirm `headers_compliant: true`.
- EU GDPR profile: legitimate interest basis documented, opt-out language present, no third-country transfers without standard contractual clauses noted in the evidence pack.
- France Loi Hamon: B2B opt-out language present.
- California: CCPA-aligned "Do Not Sell or Share" link or its B2B equivalent.
- NYC LL144 awareness: if the draft references a hiring or recruiting action and the recipient is in NYC, flag for human review.

Missing any required element for the matched profile is a block.

### 5. Deliverability and voice scan

Run the bundled checks:

- Spam-trigger phrasings — "guaranteed", "free money", "act now", "click here now", "100% free", "no obligation", excessive currency symbols.
- Subject line over 70 characters or in all caps.
- Body under 40 words or over 250 words.
- Image-only body (no plain text content).
- More than 3 outbound links.
- Link-cloaking patterns (link text that does not match the destination domain).
- Stock AI tells — "I hope this email finds you well", "I wanted to reach out", "I came across your profile" (these read as AI-generated to trained recipients and lower reply rate).
- Banned phrasings from `brand_guide` if supplied.

A single flag triggers an `edit` verdict. Two or more flags stacked trigger a `block`.

### 6. Verdict assembly

Return one verdict:

- `send` — no blocks, no edit-tier flags. The draft is releasable.
- `edit` — one or more edit-tier flags. The draft is releasable after applying the suggested rewrites (returned inline when `request_rewrite: true`).
- `block` — one or more blocking issues. The draft must not send. The blocking axis is named; the suggested fix is included.

The output format is in `references/4-sample-output.md`.

## Output format

Literal JSON the skill emits for a `block` verdict:

```json
{
"verdict": "block",
"result": "ok",
"blocking_issues": [
{
"axis": "claim_accuracy",
"finding": "Ungrounded claim: 'I saw your Series B announcement last week'. No entry in prospect_evidence supports a recent Series B.",
"fix": "Remove the claim or attach a citation to prospect_evidence and re-run."
}
],
"edit_flags": [
{
"axis": "voice",
"finding": "Stock opener detected: 'I hope this email finds you well'",
"fix": "Replace with a grounded opener tied to a specific entry in prospect_evidence."
}
],
"personalization_score": 3,
"rewritten_draft": null,
"qa_metadata": {
"model": "claude-sonnet-4-6",
"input_tokens": 2840,
"output_tokens": 420,
"rubric_version": "1.0.0"
}
}
```

A `send` verdict has empty `blocking_issues` and empty `edit_flags`. An `edit` verdict has empty `blocking_issues` and a populated `edit_flags` (plus `rewritten_draft` when `request_rewrite: true`).

## Cache-key convention

The four rubric files are stable across calls within a deployment. To use Claude prompt caching:

- Cache prefix: the concatenation of `references/1-claim-rubric.md` + `references/2-personalization-rubric.md` + `references/3-compliance-rubric.md` + `references/4-sample-output.md` is the cacheable prefix. Mark it with `cache_control: { type: "ephemeral" }` in the Anthropic SDK call.
- The variable suffix is the draft, the prospect evidence pack, and the recipient context.
- Expected cost reduction at production volume: 30-50% on input tokens. At 50,000 calls per month and an average 2,500 input tokens, that is roughly $1,500/month in savings against Sonnet 4.x list pricing.

## Watch-outs

- **False blocks on legitimate AI-pulled specifics.** If the upstream AI SDR retrieved a recent press release the evidence pack does not include, the skill flags the claim as ungrounded. **Guard:** the skill verifies against the supplied evidence pack only, never against model knowledge. The contract is that the AI SDR includes everything it used to write the draft in the pack. The fix is upstream, not loosening the rubric.
- **Personalization score gaming.** A skill that rewards specificity teaches the upstream model to stuff specific-looking tokens. **Guard:** grounded and ungrounded specifics score separately. A named entity counts only if a citation in the pack supports it; a stale specific without a current-employment citation is ungrounded.
- **Compliance creep across jurisdictions.** Different rules per recipient. **Guard:** per-jurisdiction profiles; missing context returns `insufficient_compliance_context` rather than falling back to a generic profile.
- **The skill becomes the bottleneck.** At 50,000 sends per month and a 3-second p95 per draft, serial QA adds roughly 42 hours of wall-clock. **Guard:** parallelize per-draft (20-50 in flight), cache the rubrics, alert when p95 climbs above 5 seconds.
- **Hallucinated compliance.** The skill could claim a header is present when it is not. **Guard:** the skill requires the calling agent to pass `email_headers` or set `headers_compliant: true` — it does not infer header state from the body.

# Claim rubric — TEMPLATE

> Replace this file's contents with your team's calibrated thresholds.
> The ai-sdr-draft-qa skill reads this file before every run. A blank or
> default version is usable, but the defaults below are conservative and
> will likely over-block on a high-volume SMB deployment.

## What counts as a claim

A claim is any factual assertion the draft makes about the prospect, the prospect's company, or a public event referenced as context. Examples:

- "I saw your Series B announcement last Tuesday." → claim about a funding event.
- "Your team just hired three data engineers." → claim about a hiring event.
- "Since you moved to Snowflake in March." → claim about the prospect's current employment.
- "Your CEO mentioned the migration on the Lenny podcast." → claim about a public statement.

Not a claim (do not extract):

- Generic industry observation ("RevOps teams are spending more on signal tools").
- A question to the prospect ("Are you still running the manual scoring on weekly leads?") — this is a question, not an assertion.
- A statement about the sender ("We worked with three companies in your space last quarter").

## The evidence-pack contract

`prospect_evidence` is an array of entries shaped:

```json
{
"source": "linkedin_profile|crunchbase|company_blog|news_api|gong_call|crm_note|press_release",
"retrieved_at": "ISO 8601 timestamp",
"claim_text": "the literal evidence supporting the claim",
"citation_url": "https://... (optional but recommended)"
}
```

The upstream AI SDR is responsible for emitting this pack alongside the draft. If a claim in the draft cannot be matched to any entry, the claim is ungrounded.

## Matching rules

A claim is **grounded** if at least one evidence entry meets all three:

1. **Entity match.** Same person, company, product, or event named in the claim and the evidence.
2. **Fact match.** Consistent fact (a "Series B" claim matched against a Series B entry, not a Series A entry).
3. **Freshness.** `retrieved_at` is within the per-fact-type freshness window:
- Company-level facts (HQ, employee band, public funding stage) — 90 days.
- Hiring or product-launch facts — 30 days.
- Prospect employment or role — 60 days.

A grounded claim outside the freshness window is downgraded to `stale_grounded` and surfaced as an edit-tier finding (suggested fix: refresh the evidence pack and re-run, or remove the time-sensitive specific).

## Thresholds

```yaml
claim_block_threshold: 1 # number of ungrounded claims that trips a block verdict
stale_grounded_block_threshold: 3 # number of stale_grounded findings that escalate from edit to block
```

The conservative default of 1 ungrounded claim → block surfaces every hallucinated claim. Raise to 2 only if you are tolerant of some hallucinated rate in exchange for fewer blocks (high-volume SMB deployments selling at low ACV may justify this).

## What the skill does NOT do

The claim rubric is a verifier, not a fact-checker. It does not call out to the live web, hit news APIs, or query LinkedIn. It only verifies the draft against the supplied evidence pack. If the upstream AI SDR's enrichment was wrong (the pack itself contains a hallucinated Series B), the skill will treat the claim as grounded. The fix lives upstream — pick an enrichment vendor whose retrieval the skill can trust.

## Last edited

{YYYY-MM-DD} — by {RevOps team member name}

# Personalization rubric — TEMPLATE

> Replace this file's contents with your team's calibrated rubric.
> The defaults work as a starting point but the score-to-block threshold
> matters more than the rubric itself.

## The two-pole scoring rule

Personalization is scored on a 0-5 scale. The scale separates **grounded specifics** from **ungrounded specifics** so the upstream AI SDR cannot game the score by stuffing tokens.

- **Grounded specific** — a named entity, event, or property tied to a citation in `prospect_evidence`. Examples: a podcast episode the prospect appeared on, a tool the prospect's team adopted, a specific job posting on the prospect's careers page, a thread the prospect wrote on LinkedIn last week.
- **Ungrounded specific** — a reference to "your industry", "your role", "your team", "your company" without a tied citation. Also: stale references to a prior employer presented as current ("your work at Snowflake" when the prospect moved 18 months ago and no current-employment citation is present).

Only grounded specifics count toward the score. Ungrounded specifics count zero — they read as personalized to a casual reader but add no real signal.

## Score scale

| Score | Description | Example draft excerpt |
|---|---|---|
| 0 | No specifics, only template placeholders. | "Hi {first_name}, I help companies like yours scale outbound." |
| 1 | One ungrounded specific only. | "Hi Maria, I noticed Acme is in the fintech space." |
| 2 | One grounded specific. | "Hi Maria, I read your post on outbound attribution from last Tuesday." |
| 3 | Two grounded specifics. | "Hi Maria, your post on outbound attribution last Tuesday plus the SDR job posting on Acme's careers page suggest you're scaling the team." |
| 4 | Two grounded specifics + one used as the connective tissue of the ask. | "Hi Maria — the SDR job posting on Acme's careers page reads like the same gap your attribution post described. Worth a 15-min walkthrough of how Northwind solved this?" |
| 5 | Three or more grounded specifics, tied together into a single coherent ask, with the ask landing on the prospect's named priority. | (See sample-output.md for a literal example.) |

## Threshold

```yaml
personalization_block_below: 2
```

Drafts that score 0 or 1 are blocked. A score of 2 (one grounded specific) is the floor for a releasable cold draft. Below that, the draft reads as a template — generic openers, ungrounded "your industry" references, no concrete tie to the prospect.

## When to raise the threshold

Raise `personalization_block_below` to 3 for:

- Enterprise outbound where ACV > $50K and deal velocity is slow.
- Re-engagement of warm-but-quiet prospects (the second-touch context is already there; a single grounded specific reads thin).
- Outbound to known personas with high inbox volume (CTOs, CFOs) where reply rates depend on visibly higher effort.

Keep at 2 for high-volume SMB outbound where the volume math justifies some thinner drafts.

## Score-gaming patterns to refuse

The upstream AI SDR will try to inflate the score. Watch for:

- **Stale specifics presented as current.** "Your work at Snowflake" when the prospect moved. **Rule:** an employment-specific is grounded only if a current-employment citation is present in the pack.
- **Public-figure-style references that anyone could write.** "Your work in the SaaS space" with the prospect's company swapped in. **Rule:** the specific must be unique to this prospect, not a generic fact about their industry.
- **Citation-shaped phrasings without a real citation.** "Per your LinkedIn post on Wednesday" with no Wednesday LinkedIn post in the evidence pack. **Rule:** every citation-shaped phrasing must match an entry in the pack.

## Last edited

{YYYY-MM-DD} — by {RevOps team member name}

# Compliance rubric — TEMPLATE

> Replace this file's contents with profiles tuned to your sending footprint.
> The defaults below cover the common jurisdictions for B2B outbound in 2026.
> Confirm with legal before relying on them for production sends.
>
> The ai-sdr-draft-qa skill reads `recipient.country` from the input and
> applies the matching profile. If no profile matches, the skill returns
> `result: insufficient_compliance_context`. The skill does not fall back
> to a generic profile silently — that is a banned behavior in this rubric.

## Required elements (US floor — CAN-SPAM)

Applied to all US recipients regardless of state. Every send must include:

| Element | Where it lives | What the skill checks |
|---|---|---|
| Visible unsubscribe link | Email body footer | A clickable URL whose link text contains "unsubscribe" or equivalent. |
| Physical sender address | Email body footer | A street address line in the footer block. |
| Truthful sender identity | The `From:` line | `draft.from` must match `sender_domain` (no spoofing). |
| Subject line not deceptive | The `draft.subject` field | No subject line that promises a relationship that does not exist ("Re: your reply", "Per our call yesterday") unless those events actually occurred. |

## RFC 8058 — one-click unsubscribe (Google + Yahoo bulk-sender requirement)

Effective February 2024 for any sender exceeding 5,000 messages per day to Gmail or Yahoo addresses. The skill cannot inspect raw email headers; it requires the calling agent to pass either `email_headers` (the literal header block) or set `headers_compliant: true` after the agent's own verification.

Required headers:

- `List-Unsubscribe: <mailto:unsubscribe@yourdomain.com>, <https://yourdomain.com/unsub?id=XYZ>`
- `List-Unsubscribe-Post: List-Unsubscribe=One-Click`

Missing either is a block when sending to Gmail or Yahoo. The skill checks the recipient TLD/domain to determine applicability — if the recipient is on Google Workspace or Yahoo Mail, the requirement applies.

## EU GDPR profile

Applied when `recipient.country` is in the EU/EEA. Required elements:

- **Legitimate interest basis documented in the evidence pack.** A `legitimate_interest_basis` field in any `prospect_evidence` entry, with a non-empty string explaining the basis (e.g., "B2B contact from publicly listed business email, role-aligned to product use case").
- **Visible opt-out language in the body.** Not just an unsubscribe link — an explicit sentence the prospect can read inline: "Reply STOP or click below to opt out of future emails."
- **No personal-data claims beyond what the legitimate interest basis covers.** Hiring intent inferred from "your company is hiring" without a published job posting in the pack is a block — the inference is personal data processing without basis.

Missing any required element → block.

## France Loi Hamon (B2B addition to GDPR)

Applied when `recipient.country` is France. On top of the EU profile:

- Explicit B2B opt-out language stating the recipient can refuse further commercial solicitation.

## California profile (US + state-specific)

Applied when `recipient.us_state` is `US-CA`. On top of the US floor:

- A CCPA-aligned opt-out reference. For B2B, this is the "Do Not Sell or Share My Personal Information" link, the equivalent under CPRA, or an explicit B2B opt-out sentence.

## NYC LL144 awareness (hiring-adjacent outreach only)

Applied when `recipient.us_state` is `US-NY` AND the draft references a hiring, sourcing, or recruiting action by the sender. NYC LL144 governs Automated Employment Decision Tools used in hiring decisions; outbound that references the sender's hiring workflow needs human review for LL144 alignment.

The skill does not block — it flags `human_review: ll144_hiring_outreach` and routes the draft to a reviewer queue. This is a routing decision, not a compliance verdict.

## Profile selection logic

```
function selectProfile(recipient):
  if recipient.country in EU_EEA:
    profile = "eu_gdpr"
    if recipient.country == "FR":
      profile += "+france_loi_hamon"
  elif recipient.country == "US":
    profile = "us_can_spam + rfc_8058"
    if recipient.us_state == "US-CA":
      profile += "+california_ccpa"
    if recipient.us_state == "US-NY" and draft_mentions_hiring:
      profile += "+nyc_ll144_awareness"
  elif recipient.country == "CA":
    profile = "canada_casl"     # not detailed here; CASL has its own consent rules
  elif recipient.country in ["GB", "CH", "NO"]:
    profile = "eu_gdpr_equivalent"
  else:
    return insufficient_compliance_context
```

## Profiles not covered by defaults

Brazil LGPD, India DPDP, Australia Spam Act, Singapore PDPA, Japan APPI — add these as separate profiles if your sending footprint covers those countries. Each needs its own required-elements table. Do not collapse them into a "global" fallback; the variance between regimes is too large.

## Last edited

{YYYY-MM-DD} — by {legal-ops team member name}

# Sample output — for parser wiring and integration tests

> Literal examples of the three verdicts the skill emits. Use these
> when wiring the pre-send webhook return-path, the parser that pushes
> the verdict back into 11x / Artisan / aisdr / Unify, or the integration
> tests that exercise the QA gate.

## verdict: send

A clean draft. No blocking issues, no edit flags. The calling agent releases the send.

```json
{
  "verdict": "send",
  "result": "ok",
  "blocking_issues": [],
  "edit_flags": [],
  "personalization_score": 3,
  "claim_findings": {
    "grounded": 2,
    "ungrounded": 0,
    "stale_grounded": 0
  },
  "compliance_profile_applied": "us_can_spam + rfc_8058",
  "rewritten_draft": null,
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2410,
    "output_tokens": 280,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:42:11Z"
  }
}
```

## verdict: edit

Releasable after the edit flags are applied. The calling agent either applies the suggested fixes automatically (when `request_rewrite: true` returns a `rewritten_draft`) or routes to a reviewer to apply by hand.

```json
{
  "verdict": "edit",
  "result": "ok",
  "blocking_issues": [],
  "edit_flags": [
    {
      "axis": "voice",
      "finding": "Stock AI opener: 'I hope this email finds you well'",
      "fix": "Replace with a grounded opener tied to a specific entry in prospect_evidence (e.g., a recent LinkedIn post by the prospect)."
    },
    {
      "axis": "deliverability",
      "finding": "Subject line is 78 characters (threshold: 70).",
      "fix": "Trim to under 70 characters. Suggested: 'Acme's hiring spike — quick question on attribution'"
    }
  ],
  "personalization_score": 2,
  "claim_findings": {
    "grounded": 1,
    "ungrounded": 0,
    "stale_grounded": 0
  },
  "compliance_profile_applied": "us_can_spam + rfc_8058",
  "rewritten_draft": {
    "subject": "Acme's hiring spike — quick question on attribution",
    "body": "Hi Maria — your post on outbound attribution last Tuesday lined up with the SDR job posting on Acme's careers page. Worth a 15-min walkthrough of how Northwind solved the same gap?\n\nReply STOP to opt out.\n\nOoligo, Inc. · 100 Market St, San Francisco, CA 94105"
  },
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2620,
    "output_tokens": 540,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:43:02Z"
  }
}
```

## verdict: block

Not releasable. The blocking axis is named; the calling agent must regenerate, route to a human, or hard-fail the send.

```json
{
  "verdict": "block",
  "result": "ok",
  "blocking_issues": [
    {
      "axis": "claim_accuracy",
      "finding": "Ungrounded claim: 'I saw your Series B announcement last week'. No entry in prospect_evidence supports a recent Series B.",
      "fix": "Remove the claim, or attach a Series B citation to prospect_evidence and re-run."
    },
    {
      "axis": "personalization",
      "finding": "Score 1 — single ungrounded specific ('your industry') only. Threshold for releasable: 2.",
      "fix": "Add at least one grounded specific tied to a citation in prospect_evidence."
    }
  ],
  "edit_flags": [
    {
      "axis": "voice",
      "finding": "Stock AI opener: 'I wanted to reach out'",
      "fix": "Replace with a grounded opener."
    }
  ],
  "personalization_score": 1,
  "claim_findings": {
    "grounded": 0,
    "ungrounded": 2,
    "stale_grounded": 0
  },
  "compliance_profile_applied": "us_can_spam + rfc_8058",
  "rewritten_draft": null,
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2480,
    "output_tokens": 460,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:44:18Z"
  }
}
```

## result: insufficient_input

Returned when a required input field is missing. The skill does not score; the calling agent must fix the call.

```json
{
  "verdict": null,
  "result": "insufficient_input",
  "missing_field": "prospect_evidence",
  "message": "prospect_evidence pack is required. The skill cannot verify claims against general model knowledge.",
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 320,
    "output_tokens": 80,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:45:00Z"
  }
}
```

## result: insufficient_compliance_context

Returned when `recipient.country` (or required state) maps to no jurisdictional profile. The skill refuses to score rather than falling back to a generic profile.

```json
{
  "verdict": null,
  "result": "insufficient_compliance_context",
  "missing_field": "recipient.country profile",
  "message": "No jurisdictional profile matched recipient.country='SG'. Add a Singapore PDPA profile to references/3-compliance-rubric.md.",
  "qa_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 2380,
    "output_tokens": 110,
    "rubric_version": "1.0.0",
    "ran_at": "2026-05-27T15:45:42Z"
  }
}
```

## Field contract for parsers

If the calling agent consumes the JSON directly:

- `verdict` — enum: `send` / `edit` / `block` / `null` (null when `result` is non-ok).
- `result` — enum: `ok` / `insufficient_input` / `insufficient_compliance_context` / `insufficient_evidence`.
- `blocking_issues[]` — array of `{ axis, finding, fix }`. Axes: `claim_accuracy`, `personalization`, `compliance`, `deliverability`.
- `edit_flags[]` — same shape. Axes: `voice`, `deliverability`, `claim_accuracy` (for stale_grounded).
- `personalization_score` — integer 0-5.
- `claim_findings` — object: `{ grounded, ungrounded, stale_grounded }` counts.
- `compliance_profile_applied` — string identifying the matched profile.
- `rewritten_draft` — object `{ subject, body }` or null. Populated only when `request_rewrite: true`.
- `qa_metadata` — `{ model, input_tokens, output_tokens, rubric_version, ran_at }` for cost accounting and audit.