A Claude Skill that takes a recruiter’s reference-call notes (raw transcript or recorded summary), the candidate’s resume, and the role rubric, and produces a structured reference report: per-dimension assessment with verbatim quotes, contradictions between references, areas the references didn’t cover (so the recruiter knows what to ask the next reference), and an overall confidence band — never a hire/no-hire recommendation. Replaces the recruiter’s 90-minute write-up with a 15-minute review-and-edit loop while preserving the auditability of the reference data.
When to use
You completed two or more reference calls and have either a transcript (Fathom, Gong call recordings, or detailed notes) or call summaries.
The role has a written rubric (the same one used in structured interviewing) so the synthesis can be dimension-aware.
You want the references’ claims auditable later — every assertion in the report must trace to a verbatim quote from the call notes, with the reference’s name and the call timestamp.
When NOT to use
Generating a hire/no-hire recommendation. The skill produces a structured assessment with confidence per dimension. The hire decision sits with the hiring manager and the interview debrief. Wiring the skill output to a decision triggers the same automated-decision-making concerns as auto-rejection in screening.
Replacing the reference call itself. The skill processes notes; it does not interview references. Auto-emailing references with a form (“AI-generated reference questionnaire”) produces low-quality data and erodes the reference’s willingness to speak candidly on future calls.
Recording calls without consent. Most US states are one-party consent for the recruiter to record; a few (CA, IL, FL, MD, MA, MI, MT, NH, PA, WA) are two-party. EU is GDPR — recorded calls need an explicit lawful basis. The skill processes notes regardless of how they were captured; it does not authorize recording.
Backchannel references the candidate didn’t approve. Different consent posture, different workflow, different legal exposure.
Setup
Drop the bundle. Place apps/web/public/artifacts/reference-check-summary-skill/SKILL.md into your Claude Code skills directory.
Reuse the role rubric. The skill reads the same rubric file used for screening and structured interviews. If your team doesn’t have a shared rubric, the interview question bank pack is the prerequisite.
Configure the consent record. The skill writes a consent_check field per reference (was the call recorded? did the candidate authorize the reference? did the reference consent to processing of the notes?). If any answer is no or unknown, the report is flagged with a consent-warning header.
Dry-run on a closed hire. Process the references for a candidate hired last quarter. Compare the skill’s report to your own contemporaneous write-up. Tune the rubric anchors if the skill weighs dimensions differently than the team did.
What the skill actually does
Five steps. The order matters: the consent and rubric grounding happen before the synthesis, because a synthesis without consent or rubric grounding is just a re-narration of the calls.
Validate consent. Check consent_check per reference. Missing or unknown consent → emit a warning header on the report (“Consent not recorded for reference R2 — verify before sharing report”) and continue. Do not block; the recruiter may know consent was given verbally and forgot to log it.
Ground in the rubric. Read the role rubric. The synthesis dimensions are the rubric dimensions, not generic ones (“communication,” “leadership”). If the rubric has skill_match, level_fit, ownership_signal, team_collaboration, those are the report’s headings.
Per-dimension synthesis. For each rubric dimension, extract every quote from the call notes that bears on the dimension. Group by reference. Tag each quote with strength (strong-positive, weak-positive, neutral, weak-negative, strong-negative). Quotes are verbatim from the notes; paraphrasing is not allowed because it strips the auditability the skill exists to provide.
Surface contradictions and gaps. Identify dimensions where two references diverge (one strong-positive, another weak-negative) and surface the contradiction explicitly. Identify dimensions the references didn’t cover (no quote found) and surface those as gaps so the recruiter knows what to ask the next reference, or what the rubric ranking step has to lean on instead.
Confidence band per dimension, no overall recommendation. For each dimension, return a confidence band: high (multiple references converge with strong-positive or strong-negative), medium (mixed but convergent), low (single reference or contradiction), not assessed. Do not return an overall hire/no-hire score. The decision sits with the hiring manager.
Cost reality
Per candidate report (typically 2-4 references, 60-90 minutes of total call time, 4-8K words of notes), on Claude Sonnet 4.6:
LLM tokens — typically 12-20k input (notes + rubric + skill instructions) and 2-4k output (structured report). At Sonnet 4.6 list pricing, roughly $0.10-0.18 per candidate. A team running 20 reference cycles per month spends $2-4 in model cost.
Recruiter time — the win. Hand-writing a structured reference report is 60-90 minutes per candidate. Reviewing the skill’s report and editing tone or adding context is 15-25 minutes. The bigger time saver is on the contradictions section, which a recruiter often misses on a first pass through their own notes.
Setup time — 30 minutes once for the rubric integration and consent-check format. Each role’s rubric is reused, so the marginal setup per role is zero.
Success metric
Track two numbers:
Hiring-manager satisfaction with the report — a 1-5 score the hiring manager gives after the debrief, on whether the report surfaced the right dimensions and didn’t bury the contradictions. Should sit at 4+ on a calibrated rubric.
Reference-cycle time — wall-clock from “last reference completed” to “hiring manager has the report.” Should drop from 1-2 days to under 2 hours.
vs alternatives
vs hand-written report. Hand-written is the right call for the highest-stakes hires (executive, board-facing) where the recruiter’s narrative voice is the deliverable. The skill earns its setup cost on the 80% of hires where the structured artifact is what the team needs.
vs ATS-native reference automation (Greenhouse Reference Check, Crosschq, SkillSurvey). Those products own the reference collection (questionnaire-style references via email). Pick them if your firm prefers async questionnaire references. Pick this skill if your team prefers live calls and the bottleneck is the synthesis afterward. The two are complementary; the skill works on questionnaire output too.
vs ChatGPT-style “summarize these reference notes.” Generic chat returns a paragraph that reads well and buries the contradictions. The Skill is structurally different: it forces per-dimension grouping, requires verbatim quotes, refuses to author an overall recommendation.
Watch-outs
Hindsight bias on high-confidence references.Guard: the report’s structure forces per-dimension grouping rather than reference-led narrative, which makes it harder for one strongly-opinionated reference to dominate the read.
Hallucinated quotes.Guard: the skill is constrained to verbatim extraction. Quotes that don’t appear in the call notes verbatim are forbidden; the prompt explicitly directs the model to omit a dimension if no quote can be cited rather than paraphrase.
Over-weighting one reference.Guard: contradictions are surfaced explicitly, with both quotes side by side. The report’s confidence-band logic downgrades to low on dimensions where references diverge, which prevents a confident-but-mistaken read.
Implicit hire recommendation through ordering.Guard: the report orders dimensions by the rubric, not by the reference’s enthusiasm. Strong-positive quotes do not float to the top; they land in the dimension they belong to.
Consent and recording exposure.Guard: the consent-check field per reference is required input; missing consent triggers a warning header. The skill processes notes regardless of recording status, but it does not absolve the recruiter of the underlying consent obligation.
Bias in the underlying rubric carrying through.Guard: if the rubric has dimensions that fail a fairness check (“culture fit” without anchors, school-tier scoring), the synthesis inherits the bias. Run the rubric through the diversity slate auditor for the role’s pool first.
Stack
The skill bundle lives at apps/web/public/artifacts/reference-check-summary-skill/ and contains:
references/2-consent-checklist.md — the consent-check schema and warning-header rules
Tools the workflow assumes you use: Claude (the model). Optional: Fathom or Gong for call recording; Ashby for the candidate record. For the parallel interview-debrief workflow, see the interview debrief summary skill.
---
name: reference-check-summary
description: Take reference-call notes (transcript or summary) plus the role rubric, and produce a structured per-dimension reference report with verbatim quotes, contradictions surfaced, and per-dimension confidence bands. Never authors an overall hire/no-hire recommendation — the decision sits with the hiring manager.
---
# Reference-check synthesis
## When to invoke
Use this skill when a recruiter has completed two or more reference calls and has notes (transcript, recorded call summary, or detailed manual notes) plus the role rubric. Take the notes plus rubric as input and return a structured Markdown report.
Do NOT invoke this skill for:
- **Generating a hire/no-hire recommendation.** This skill produces structured assessment with confidence per dimension. The hire decision sits with the hiring manager and the interview debrief.
- **Replacing the reference call itself.** This skill processes notes; it does not interview references. AI-generated reference questionnaires erode the reference's willingness to speak candidly.
- **Recording calls without consent.** The skill processes notes regardless of recording status, but does not authorize recording. Two-party-consent jurisdictions and EU GDPR have explicit lawful-basis requirements.
- **Backchannel references the candidate did not approve.** Different consent posture, different workflow.
## Inputs
- Required: `notes_dir` — path to a directory of per-reference Markdown files. Each file: `R1.md`, `R2.md`, etc., with the reference's name, role, relationship, call date, and notes.
- Required: `rubric` — path to the role rubric file. The rubric's dimensions become the report's headings.
- Required: `consent_log` — path to a per-reference consent record (see `references/2-consent-checklist.md`).
- Optional: `candidate_resume` — path to the resume. Used to ground statements like "the reference confirmed the deal mentioned on the resume" rather than re-narrating the resume.
## Reference files
Always read these from `references/`:
- `references/1-report-format.md` — the literal output format. Per-dimension headings come from the rubric, not from this file.
- `references/2-consent-checklist.md` — the consent-check schema and the warning-header rules.
## Method
Five steps, in order.
### 1. Validate consent
Open `consent_log`. For each reference, check four fields: `candidate_authorized` (the candidate gave the recruiter permission to call this person), `recording_consent` (if the call was recorded), `notes_processing_consent` (the reference was told the notes might be processed by AI), `jurisdiction` (which state / country the reference was in during the call).
If any field is `unknown` or `no`, do NOT halt — emit a warning header at the top of the report and continue. The recruiter may have collected consent verbally and forgotten to log it; the warning surfaces the gap for them to verify before sharing the report.
If `recording_consent: no` and `jurisdiction` is in `[CA, IL, FL, MD, MA, MI, MT, NH, PA, WA]` or any EU country, the warning header upgrades to a halt: "Two-party consent jurisdiction; recording without consent is illegal. The skill will not process the notes from this reference. Verify consent and re-run with `consent_log` updated, or omit this reference."
### 2. Ground in the rubric
Read the rubric. The synthesis dimensions ARE the rubric dimensions, not generic ones. If the rubric has `skill_match`, `level_fit`, `ownership_signal`, `team_collaboration`, those are the report's section headings.
If the rubric has dimensions that fail a fairness check (school-tier scoring, "culture fit" without anchors, employment-gap penalties), surface them but proceed — the rubric is upstream of this skill, and the right fix is at the rubric layer, not by silently dropping dimensions here.
### 3. Per-dimension synthesis
For each rubric dimension, read every reference's notes and extract every quote that bears on the dimension. A quote is a verbatim string from the notes; paraphrasing is not allowed. If you cannot extract a verbatim quote for a reference's view on a dimension, the cell stays empty and the dimension's confidence band reflects the gap.
Tag each quote with strength on a 5-level scale:
- `strong-positive` — explicit named outcome, clear ownership, the reference stakes their credibility on it.
- `weak-positive` — observed positive behavior but no named outcome or scope.
- `neutral` — descriptive without judgment.
- `weak-negative` — observed gap or hesitation, qualified.
- `strong-negative` — explicit disqualifying behavior named, with scope.
### 4. Surface contradictions and gaps
For each dimension, compare the per-reference assessments. If two references diverge by ≥2 levels (e.g. one `strong-positive`, one `weak-negative`), surface the contradiction explicitly with both quotes side by side. Do NOT average or smooth — the contradiction IS the signal.
For each dimension, identify gaps: dimensions no reference covered. List them in a "Coverage gaps" section. The recruiter uses this to decide what to ask the next reference, or what the rubric ranking step has to lean on instead.
### 5. Confidence band per dimension
For each dimension, return a confidence band:
- `high` — multiple references converge with strong-positive or strong-negative quotes.
- `medium` — references mostly converge, weak-positive / weak-negative quotes, no contradictions.
- `low` — single reference, contradiction surfaced, or only weak-strength quotes.
- `not assessed` — no reference covered the dimension.
Do NOT return an overall hire/no-hire score. The report ends after the last dimension's confidence band.
## Output format
See `references/1-report-format.md` for the literal template. The shape is:
```
# Reference report — {Candidate name} — {Role}
[CONSENT WARNING HEADER if any reference's consent is missing]
## References
| ID | Name | Role | Relationship | Call date |
|---|---|---|---|---|
| R1 | ... | ... | ... | ... |
## Per-dimension synthesis
### {Dimension 1 from rubric}
**Confidence: {band}**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "..." |
| R2 | weak-positive | "..." |
[CONTRADICTION block if R1 and R2 diverge ≥2 levels]
### {Dimension 2 from rubric} ...
## Coverage gaps
Dimensions no reference addressed:
- {dimension X} — recruiter to ask R3 or rely on rubric ranking step.
## Provenance
- Rubric: `{path}` — SHA `{short}`
- Notes: `{notes_dir}` — N references processed
- Generated: `{ISO timestamp}`
```
## Watch-outs
- **Hallucinated quotes.** *Guard:* the prompt forbids paraphrasing; quotes must appear verbatim in the input notes. If you cannot find a verbatim quote for a reference's view on a dimension, the cell is empty and the confidence band drops.
- **Hindsight bias.** *Guard:* the report is structured per-dimension, not per-reference. A strongly opinionated reference cannot dominate the narrative because the report doesn't have a narrative — it has a table per dimension.
- **Implicit recommendation via ordering.** *Guard:* dimensions are ordered by rubric, not by reference enthusiasm. Strong-positive quotes do not float to the top.
- **Consent gaps.** *Guard:* warning header on missing consent; halt on illegal recording in two-party jurisdictions.
- **Bias inheritance from rubric.** *Guard:* surfaced but not silently dropped — the right fix is at the rubric layer, upstream of this skill.
# Reference report format
This is the literal output template the skill writes. Every report follows this shape so downstream consumers (hiring manager, recruiting coordinator, audit reviewer) read predictable structure.
## Template
```markdown
# Reference report — {Candidate name} — {Role title}
Generated: {ISO timestamp} · Rubric SHA: {short hash} · Skill version: 1.0
{CONSENT WARNING HEADER — present only if any reference has missing consent — see consent-checklist.md}
## References
| ID | Name | Role | Relationship to candidate | Call date | Duration |
|---|---|---|---|---|---|
| R1 | Jamie Liu | VP Eng, Acme Fintech | Direct manager (2y) | 2026-04-28 | 45m |
| R2 | Sam Park | Senior IC peer, Acme Fintech | Cross-team collaborator (1y) | 2026-04-30 | 30m |
## Per-dimension synthesis
### Skill match — production Go and distributed-systems experience
**Confidence: high**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Owned the entire payments routing rewrite in Go — moved from synchronous to event-driven, took our P99 from 800ms to 180ms over Q3." |
| R2 | strong-positive | "When we needed someone to actually understand the consensus layer in our state machine, Jamie was the only person who could explain why the failover semantics were broken." |
### Level fit — Senior IC scope, cross-team influence
**Confidence: medium**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Was effectively the tech lead on the routing team — running the design reviews, mentoring two juniors." |
| R2 | weak-positive | "Came over to our team for the integration work — drove the meetings but it was a smaller scope, just three of us." |
*Note: confidence is medium because R2's scope was a single integration; R1's scope was a multi-quarter team-leadership signal. The strong-positive on team-lead scope only comes from R1.*
### Team collaboration — handles disagreement well
**Confidence: low**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Pushed back on a design I'd already approved, with data — turned out he was right and we caught a P0 before it shipped." |
| R2 | weak-negative | "Sometimes the pushback comes across as harsh in the moment — I had to mediate once between Jamie and one of our front-end folks." |
**⚠️ Contradiction surfaced.** R1 and R2 diverge by 2 levels on this dimension. R1's framing is that the pushback is principled and outcome-positive; R2's framing is that the delivery has interpersonal cost. Recruiter to surface this in the hiring-manager debrief.
### Ownership signal — sees work through to outcome
**Confidence: high**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Stayed on the routing project through the post-launch operational phase — wasn't the kind of engineer who hands off after launch." |
| R2 | strong-positive | "When the integration work hit a snag with our auth team, Jamie went and unblocked it himself rather than escalating." |
## Coverage gaps
Dimensions the references did not address (no verbatim quote found):
- **Response to ambiguity** — neither reference described a situation where the candidate had to act under unclear requirements. Recruiter to ask R3, or rely on the structured-interview step that probes this.
- **Customer-facing scope** — no quotes on the candidate's interaction with customers or with non-technical stakeholders. If the role requires customer-facing work, this gap matters.
## Provenance
- Rubric: `data/rubrics/senior-backend-engineer.json` — SHA `a3f2b1c4d5e6f7a8`
- Notes: `data/references/jamie-liu/` — 2 references processed
- Consent log: `data/references/jamie-liu/consent.json`
- Generated by: `reference-check-summary` skill v1.0 on Claude Sonnet 4.6
- Generated at: 2026-05-03T14:00:00Z
```
## Notes on the template
- **No overall hire/no-hire recommendation.** The report ends after the last per-dimension table and the coverage-gaps section. The decision sits with the hiring manager.
- **Dimension order matches the rubric.** The skill does NOT reorder by reference enthusiasm or by confidence band. The rubric's ordering reflects the team's prioritization; the report respects that.
- **Quotes are verbatim.** No paraphrasing, no smoothing. If a reference said "kinda harsh" the report says "kinda harsh," not "somewhat harsh."
- **Contradictions surface inline.** A separate "contradictions" section at the end is harder to read than inline notes per dimension.
# Consent checklist for reference processing
The reference-check-summary skill requires a per-reference consent log as input. This file documents the schema, the warning-header rules, and the halt conditions.
## Per-reference consent record
For each reference, the consent log contains:
```json
{
"reference_id": "R1",
"candidate_authorized": true,
"recording_consent": true,
"notes_processing_consent": true,
"jurisdiction": "US-NY",
"recorded": true,
"consent_collected_at": "2026-04-28T14:00:00Z",
"consent_collected_by": "recruiter-email@firm.com"
}
```
### Field definitions
- `candidate_authorized` — the candidate told the recruiter "you can call this person." Without this, the reference call should not have happened. Halt if any reference's value is `false`.
- `recording_consent` — if the call was recorded, the reference consented to recording. The skill needs this only if `recorded: true`.
- `notes_processing_consent` — the reference was told that the notes from the call may be processed by AI to generate a structured report. This is the explicit consent for the skill's processing path under GDPR Art. 6 lawful-basis requirements.
- `jurisdiction` — the state or country the reference was physically in during the call. This determines recording-consent law.
- `recorded` — whether the call was recorded.
## Warning-header rules
If any reference's consent record is missing or has `unknown`/`null` values, the report's top-of-page warning header reads:
```
⚠️ CONSENT WARNING
The following references have incomplete consent records:
- R2: notes_processing_consent is unknown.
- R3: candidate_authorized is unknown.
Verify consent before sharing this report. The skill processed the
notes regardless of the gap; the warning surfaces the gap for the
recruiter to confirm with the candidate and reference.
```
The warning is informational. The skill continues to the report. The recruiter is responsible for either confirming the missing consent (and updating the log for next time) or omitting the affected reference from the shared report.
## Halt conditions
Halt processing for a reference (skip it, do not include in the report) when:
1. **`candidate_authorized: false`** — the reference call should not have happened. Including the reference in the report would compound the underlying consent failure. Surface to the recruiter as a gap to address.
2. **`recorded: true` AND `recording_consent: false` AND `jurisdiction` is in a two-party-consent jurisdiction.** Two-party-consent jurisdictions (CA, IL, FL, MD, MA, MI, MT, NH, PA, WA in the US, plus all EU countries under GDPR) make recording without consent illegal. Processing the recorded notes compounds the violation. The skill refuses to process the reference and surfaces the issue to the recruiter.
```
HALT: R2 was recorded in CA without consent. Recording is illegal
in CA without two-party consent. The skill will not process this
reference's notes. Either delete the recording and re-interview the
reference (with consent this time), or omit the reference from the
report.
```
3. **`notes_processing_consent: false`** — the reference explicitly declined to have notes processed by AI. The skill respects that. The reference's notes can still inform the recruiter's own write-up, but they are not run through the skill.
## Why this matters
GDPR Art. 6 requires a lawful basis for processing personal data. A reference's notes ARE personal data (the reference's, and the candidate's). The lawful basis for AI processing is most commonly explicit consent or legitimate interest with a balancing test. In either case, the reference must have been informed.
NYC LL 144 and the EU AI Act focus on the candidate side, but reference data falls in the same processing pipeline. A defensible recruiting AI posture handles consent on both sides.
The skill cannot enforce that the recruiter actually collected consent. What it can enforce is that the consent is logged before processing, and that missing or contradictory consent surfaces to the recruiter rather than getting buried.
## What goes in the consent log when you didn't collect consent properly
The honest answer: omit the reference from this skill's processing. Use your own write-up. The skill's auditability comes from the consent record being trustworthy; populating it with `unknown` to make the skill run defeats the purpose.
Update your reference-call intake script to collect the four fields above as part of the call opening. The marginal time cost is 30 seconds per call.