Una Claude Skill que toma las notas del reclutador de las llamadas de referencias (transcripción cruda o resumen grabado), el CV del candidato y la rúbrica del rol, y produce un reporte estructurado de referencias: evaluación por dimensión con citas textuales, contradicciones entre referencias, áreas que las referencias no cubrieron (para que el reclutador sepa qué preguntarle a la próxima referencia) y una banda de confianza general — nunca una recomendación de contratar/no contratar. Reemplaza el informe de 90 minutos del reclutador con un ciclo de revisión y edición de 15 minutos, preservando la auditabilidad de los datos de referencia.
Cuándo usarlo
Completaste dos o más llamadas de referencias y tienes una transcripción (Fathom, grabaciones de llamadas de Gong o notas detalladas) o resúmenes de las llamadas.
El rol tiene una rúbrica escrita (la misma que se usa en structured interviewing), de modo que la síntesis pueda ser sensible a las dimensiones.
Quieres que las afirmaciones de las referencias sean auditables más adelante — cada afirmación del reporte debe rastrearse a una cita textual de las notas de la llamada, con el nombre de la referencia y la marca de tiempo de la llamada.
Cuándo NO usarlo
Generar una recomendación de contratar/no contratar. La skill produce una evaluación estructurada con confianza por dimensión. La decisión de contratación queda en manos del hiring manager y del interview debrief. Conectar la salida de la skill a una decisión activa las mismas preocupaciones de toma de decisiones automatizada que el rechazo automático en el screening.
Reemplazar la llamada de referencia en sí. La skill procesa notas; no entrevista a las referencias. Enviar emails automáticos a las referencias con un formulario (“cuestionario de referencia generado por IA”) produce datos de baja calidad y erosiona la disposición de la referencia a hablar con franqueza en futuras llamadas.
Grabar llamadas sin consentimiento. La mayoría de los estados de EE. UU. son de consentimiento de una sola parte para que el reclutador grabe; algunos (CA, IL, FL, MD, MA, MI, MT, NH, PA, WA) son de dos partes. La UE se rige por el GDPR — las llamadas grabadas necesitan una base legal explícita. La skill procesa notas independientemente de cómo se hayan capturado; no autoriza la grabación.
Referencias por canales paralelos (backchannel) que el candidato no aprobó. Postura de consentimiento distinta, workflow distinto, exposición legal distinta.
Setup
Coloca el bundle. Pon apps/web/public/artifacts/reference-check-summary-skill/SKILL.md en tu directorio de skills de Claude Code.
Reutiliza la rúbrica del rol. La skill lee el mismo archivo de rúbrica que se usa para screening y entrevistas estructuradas. Si tu equipo no tiene una rúbrica compartida, el interview question bank pack es el prerrequisito.
Configura el registro de consentimiento. La skill escribe un campo consent_check por referencia (¿se grabó la llamada? ¿el candidato autorizó a la referencia? ¿la referencia consintió el procesamiento de las notas?). Si alguna respuesta es no o unknown, el reporte se marca con un encabezado de advertencia de consentimiento.
Dry-run sobre una contratación cerrada. Procesa las referencias de un candidato contratado el trimestre pasado. Compara el reporte de la skill con tu propio informe redactado en el momento. Ajusta los anclajes de la rúbrica si la skill pondera las dimensiones de forma distinta a como lo hizo el equipo.
Lo que hace realmente la skill
Cinco pasos. El orden importa: la validación de consentimiento y el anclaje en la rúbrica ocurren antes de la síntesis, porque una síntesis sin consentimiento ni anclaje en la rúbrica es solo una renarración de las llamadas.
Valida el consentimiento. Revisa consent_check por referencia. Consentimiento ausente o unknown → emite un encabezado de advertencia en el reporte (“Consentimiento no registrado para la referencia R2 — verificar antes de compartir el reporte”) y continúa. No bloquea; el reclutador puede saber que el consentimiento se dio verbalmente y olvidó registrarlo.
Anclaje en la rúbrica. Lee la rúbrica del rol. Las dimensiones de la síntesis son las dimensiones de la rúbrica, no genéricas (“comunicación”, “liderazgo”). Si la rúbrica tiene skill_match, level_fit, ownership_signal, team_collaboration, esos son los encabezados del reporte.
Síntesis por dimensión. Para cada dimensión de la rúbrica, extrae cada cita de las notas de la llamada que aporte evidencia sobre la dimensión. Agrupa por referencia. Etiqueta cada cita con su fuerza (strong-positive, weak-positive, neutral, weak-negative, strong-negative). Las citas son textuales de las notas; la paráfrasis no se permite porque despoja al reporte de la auditabilidad que es la razón de existir de la skill.
Saca a la luz contradicciones y vacíos. Identifica dimensiones donde dos referencias divergen (una strong-positive, otra weak-negative) y expone la contradicción explícitamente. Identifica dimensiones que las referencias no cubrieron (sin citas encontradas) y exponlas como vacíos para que el reclutador sepa qué preguntarle a la próxima referencia, o sobre qué tendrá que apoyarse en cambio el paso de ranking de la rúbrica.
Banda de confianza por dimensión, sin recomendación general. Para cada dimensión, devuelve una banda de confianza: high (varias referencias convergen con strong-positive o strong-negative), medium (mixto pero convergente), low (referencia única o contradicción), not assessed. No devuelve un score general de contratar/no contratar. La decisión queda en manos del hiring manager.
Costo real
Por reporte de candidato (típicamente 2-4 referencias, 60-90 minutos de tiempo total de llamada, 4-8K palabras de notas), con Claude Sonnet 4.6:
Tokens del LLM — típicamente 12-20k de entrada (notas + rúbrica + instrucciones de la skill) y 2-4k de salida (reporte estructurado). Al precio de lista de Sonnet 4.6, aproximadamente $0.10-0.18 por candidato. Un equipo que corre 20 ciclos de referencias por mes gasta $2-4 en costo de modelo.
Tiempo del reclutador — la ganancia. Escribir a mano un reporte estructurado de referencias toma 60-90 minutos por candidato. Revisar el reporte de la skill y editar el tono o agregar contexto toma 15-25 minutos. El mayor ahorro está en la sección de contradicciones, que un reclutador suele pasar por alto en una primera lectura de sus propias notas.
Tiempo de setup — 30 minutos una sola vez para la integración de la rúbrica y el formato de consent-check. La rúbrica de cada rol se reutiliza, así que el setup marginal por rol es cero.
Métrica de éxito
Mide dos números:
Satisfacción del hiring manager con el reporte — un puntaje de 1-5 que da el hiring manager después del debrief, sobre si el reporte hizo aflorar las dimensiones correctas y no enterró las contradicciones. Debería ubicarse en 4+ en una rúbrica calibrada.
Tiempo de ciclo de referencias — tiempo de reloj desde “última referencia completada” hasta “el hiring manager tiene el reporte”. Debería bajar de 1-2 días a menos de 2 horas.
vs alternativas
vs reporte escrito a mano. Escrito a mano es la elección correcta para las contrataciones de mayor riesgo (ejecutivas, frente al directorio) donde la voz narrativa del reclutador es el entregable. La skill amortiza su costo de setup en el 80% de las contrataciones donde el artefacto estructurado es lo que el equipo necesita.
vs automatización de referencias nativa del ATS (Greenhouse Reference Check, Crosschq, SkillSurvey). Esos productos se quedan con la recolección de referencias (referencias estilo cuestionario por email). Elígelos si tu empresa prefiere referencias asíncronas tipo cuestionario. Elige esta skill si tu equipo prefiere llamadas en vivo y el cuello de botella es la síntesis posterior. Las dos son complementarias; la skill también funciona sobre la salida de un cuestionario.
vs “resumime estas notas de referencia” estilo ChatGPT. El chat genérico devuelve un párrafo que se lee bien y entierra las contradicciones. La Skill es estructuralmente distinta: fuerza el agrupamiento por dimensión, exige citas textuales, se niega a redactar una recomendación general.
A qué prestar atención
Sesgo de retrospectiva con referencias de alta confianza.Guarda: la estructura del reporte fuerza el agrupamiento por dimensión en lugar de la narrativa liderada por la referencia, lo que dificulta que una referencia fuertemente opinada domine la lectura.
Citas alucinadas.Guarda: la skill está restringida a extracción textual. Las citas que no aparecen textualmente en las notas de la llamada están prohibidas; el prompt instruye explícitamente al modelo a omitir una dimensión si no puede citarse ninguna frase, en lugar de parafrasear.
Sobreponderar una sola referencia.Guarda: las contradicciones se exponen explícitamente, con las dos citas lado a lado. La lógica de banda de confianza del reporte baja a low en las dimensiones donde las referencias divergen, lo que previene una lectura confiada pero equivocada.
Recomendación implícita de contratación por el orden.Guarda: el reporte ordena las dimensiones por la rúbrica, no por el entusiasmo de la referencia. Las citas strong-positive no flotan hacia arriba; aterrizan en la dimensión a la que pertenecen.
Exposición por consentimiento y grabación.Guarda: el campo de consent-check por referencia es entrada obligatoria; el consentimiento ausente dispara un encabezado de advertencia. La skill procesa notas independientemente del estado de grabación, pero no exime al reclutador de la obligación subyacente de consentimiento.
Sesgo en la rúbrica subyacente que se arrastra.Guarda: si la rúbrica tiene dimensiones que no pasan un control de fairness (“culture fit” sin anclajes, scoring por prestigio de universidad), la síntesis hereda el sesgo. Pasa primero la rúbrica por el diversity slate auditor para el pool del rol.
Stack
El bundle de la skill vive en apps/web/public/artifacts/reference-check-summary-skill/ y contiene:
SKILL.md — la definición de la skill
references/1-report-format.md — la plantilla literal de salida (encabezados por dimensión, escala de banda de confianza, sección de contradicciones)
references/2-consent-checklist.md — el esquema del consent-check y las reglas del encabezado de advertencia
Herramientas que el workflow asume que usas: Claude (el modelo). Opcionales: Fathom o Gong para grabación de llamadas; Ashby para el registro del candidato. Para el workflow paralelo de interview-debrief, ver interview debrief summary skill.
---
name: reference-check-summary
description: Take reference-call notes (transcript or summary) plus the role rubric, and produce a structured per-dimension reference report with verbatim quotes, contradictions surfaced, and per-dimension confidence bands. Never authors an overall hire/no-hire recommendation — the decision sits with the hiring manager.
---
# Reference-check synthesis
## When to invoke
Use this skill when a recruiter has completed two or more reference calls and has notes (transcript, recorded call summary, or detailed manual notes) plus the role rubric. Take the notes plus rubric as input and return a structured Markdown report.
Do NOT invoke this skill for:
- **Generating a hire/no-hire recommendation.** This skill produces structured assessment with confidence per dimension. The hire decision sits with the hiring manager and the interview debrief.
- **Replacing the reference call itself.** This skill processes notes; it does not interview references. AI-generated reference questionnaires erode the reference's willingness to speak candidly.
- **Recording calls without consent.** The skill processes notes regardless of recording status, but does not authorize recording. Two-party-consent jurisdictions and EU GDPR have explicit lawful-basis requirements.
- **Backchannel references the candidate did not approve.** Different consent posture, different workflow.
## Inputs
- Required: `notes_dir` — path to a directory of per-reference Markdown files. Each file: `R1.md`, `R2.md`, etc., with the reference's name, role, relationship, call date, and notes.
- Required: `rubric` — path to the role rubric file. The rubric's dimensions become the report's headings.
- Required: `consent_log` — path to a per-reference consent record (see `references/2-consent-checklist.md`).
- Optional: `candidate_resume` — path to the resume. Used to ground statements like "the reference confirmed the deal mentioned on the resume" rather than re-narrating the resume.
## Reference files
Always read these from `references/`:
- `references/1-report-format.md` — the literal output format. Per-dimension headings come from the rubric, not from this file.
- `references/2-consent-checklist.md` — the consent-check schema and the warning-header rules.
## Method
Five steps, in order.
### 1. Validate consent
Open `consent_log`. For each reference, check four fields: `candidate_authorized` (the candidate gave the recruiter permission to call this person), `recording_consent` (if the call was recorded), `notes_processing_consent` (the reference was told the notes might be processed by AI), `jurisdiction` (which state / country the reference was in during the call).
If any field is `unknown` or `no`, do NOT halt — emit a warning header at the top of the report and continue. The recruiter may have collected consent verbally and forgotten to log it; the warning surfaces the gap for them to verify before sharing the report.
If `recording_consent: no` and `jurisdiction` is in `[CA, IL, FL, MD, MA, MI, MT, NH, PA, WA]` or any EU country, the warning header upgrades to a halt: "Two-party consent jurisdiction; recording without consent is illegal. The skill will not process the notes from this reference. Verify consent and re-run with `consent_log` updated, or omit this reference."
### 2. Ground in the rubric
Read the rubric. The synthesis dimensions ARE the rubric dimensions, not generic ones. If the rubric has `skill_match`, `level_fit`, `ownership_signal`, `team_collaboration`, those are the report's section headings.
If the rubric has dimensions that fail a fairness check (school-tier scoring, "culture fit" without anchors, employment-gap penalties), surface them but proceed — the rubric is upstream of this skill, and the right fix is at the rubric layer, not by silently dropping dimensions here.
### 3. Per-dimension synthesis
For each rubric dimension, read every reference's notes and extract every quote that bears on the dimension. A quote is a verbatim string from the notes; paraphrasing is not allowed. If you cannot extract a verbatim quote for a reference's view on a dimension, the cell stays empty and the dimension's confidence band reflects the gap.
Tag each quote with strength on a 5-level scale:
- `strong-positive` — explicit named outcome, clear ownership, the reference stakes their credibility on it.
- `weak-positive` — observed positive behavior but no named outcome or scope.
- `neutral` — descriptive without judgment.
- `weak-negative` — observed gap or hesitation, qualified.
- `strong-negative` — explicit disqualifying behavior named, with scope.
### 4. Surface contradictions and gaps
For each dimension, compare the per-reference assessments. If two references diverge by ≥2 levels (e.g. one `strong-positive`, one `weak-negative`), surface the contradiction explicitly with both quotes side by side. Do NOT average or smooth — the contradiction IS the signal.
For each dimension, identify gaps: dimensions no reference covered. List them in a "Coverage gaps" section. The recruiter uses this to decide what to ask the next reference, or what the rubric ranking step has to lean on instead.
### 5. Confidence band per dimension
For each dimension, return a confidence band:
- `high` — multiple references converge with strong-positive or strong-negative quotes.
- `medium` — references mostly converge, weak-positive / weak-negative quotes, no contradictions.
- `low` — single reference, contradiction surfaced, or only weak-strength quotes.
- `not assessed` — no reference covered the dimension.
Do NOT return an overall hire/no-hire score. The report ends after the last dimension's confidence band.
## Output format
See `references/1-report-format.md` for the literal template. The shape is:
```
# Reference report — {Candidate name} — {Role}
[CONSENT WARNING HEADER if any reference's consent is missing]
## References
| ID | Name | Role | Relationship | Call date |
|---|---|---|---|---|
| R1 | ... | ... | ... | ... |
## Per-dimension synthesis
### {Dimension 1 from rubric}
**Confidence: {band}**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "..." |
| R2 | weak-positive | "..." |
[CONTRADICTION block if R1 and R2 diverge ≥2 levels]
### {Dimension 2 from rubric} ...
## Coverage gaps
Dimensions no reference addressed:
- {dimension X} — recruiter to ask R3 or rely on rubric ranking step.
## Provenance
- Rubric: `{path}` — SHA `{short}`
- Notes: `{notes_dir}` — N references processed
- Generated: `{ISO timestamp}`
```
## Watch-outs
- **Hallucinated quotes.** *Guard:* the prompt forbids paraphrasing; quotes must appear verbatim in the input notes. If you cannot find a verbatim quote for a reference's view on a dimension, the cell is empty and the confidence band drops.
- **Hindsight bias.** *Guard:* the report is structured per-dimension, not per-reference. A strongly opinionated reference cannot dominate the narrative because the report doesn't have a narrative — it has a table per dimension.
- **Implicit recommendation via ordering.** *Guard:* dimensions are ordered by rubric, not by reference enthusiasm. Strong-positive quotes do not float to the top.
- **Consent gaps.** *Guard:* warning header on missing consent; halt on illegal recording in two-party jurisdictions.
- **Bias inheritance from rubric.** *Guard:* surfaced but not silently dropped — the right fix is at the rubric layer, upstream of this skill.
# Reference report format
This is the literal output template the skill writes. Every report follows this shape so downstream consumers (hiring manager, recruiting coordinator, audit reviewer) read predictable structure.
## Template
```markdown
# Reference report — {Candidate name} — {Role title}
Generated: {ISO timestamp} · Rubric SHA: {short hash} · Skill version: 1.0
{CONSENT WARNING HEADER — present only if any reference has missing consent — see consent-checklist.md}
## References
| ID | Name | Role | Relationship to candidate | Call date | Duration |
|---|---|---|---|---|---|
| R1 | Jamie Liu | VP Eng, Acme Fintech | Direct manager (2y) | 2026-04-28 | 45m |
| R2 | Sam Park | Senior IC peer, Acme Fintech | Cross-team collaborator (1y) | 2026-04-30 | 30m |
## Per-dimension synthesis
### Skill match — production Go and distributed-systems experience
**Confidence: high**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Owned the entire payments routing rewrite in Go — moved from synchronous to event-driven, took our P99 from 800ms to 180ms over Q3." |
| R2 | strong-positive | "When we needed someone to actually understand the consensus layer in our state machine, Jamie was the only person who could explain why the failover semantics were broken." |
### Level fit — Senior IC scope, cross-team influence
**Confidence: medium**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Was effectively the tech lead on the routing team — running the design reviews, mentoring two juniors." |
| R2 | weak-positive | "Came over to our team for the integration work — drove the meetings but it was a smaller scope, just three of us." |
*Note: confidence is medium because R2's scope was a single integration; R1's scope was a multi-quarter team-leadership signal. The strong-positive on team-lead scope only comes from R1.*
### Team collaboration — handles disagreement well
**Confidence: low**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Pushed back on a design I'd already approved, with data — turned out he was right and we caught a P0 before it shipped." |
| R2 | weak-negative | "Sometimes the pushback comes across as harsh in the moment — I had to mediate once between Jamie and one of our front-end folks." |
**⚠️ Contradiction surfaced.** R1 and R2 diverge by 2 levels on this dimension. R1's framing is that the pushback is principled and outcome-positive; R2's framing is that the delivery has interpersonal cost. Recruiter to surface this in the hiring-manager debrief.
### Ownership signal — sees work through to outcome
**Confidence: high**
| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Stayed on the routing project through the post-launch operational phase — wasn't the kind of engineer who hands off after launch." |
| R2 | strong-positive | "When the integration work hit a snag with our auth team, Jamie went and unblocked it himself rather than escalating." |
## Coverage gaps
Dimensions the references did not address (no verbatim quote found):
- **Response to ambiguity** — neither reference described a situation where the candidate had to act under unclear requirements. Recruiter to ask R3, or rely on the structured-interview step that probes this.
- **Customer-facing scope** — no quotes on the candidate's interaction with customers or with non-technical stakeholders. If the role requires customer-facing work, this gap matters.
## Provenance
- Rubric: `data/rubrics/senior-backend-engineer.json` — SHA `a3f2b1c4d5e6f7a8`
- Notes: `data/references/jamie-liu/` — 2 references processed
- Consent log: `data/references/jamie-liu/consent.json`
- Generated by: `reference-check-summary` skill v1.0 on Claude Sonnet 4.6
- Generated at: 2026-05-03T14:00:00Z
```
## Notes on the template
- **No overall hire/no-hire recommendation.** The report ends after the last per-dimension table and the coverage-gaps section. The decision sits with the hiring manager.
- **Dimension order matches the rubric.** The skill does NOT reorder by reference enthusiasm or by confidence band. The rubric's ordering reflects the team's prioritization; the report respects that.
- **Quotes are verbatim.** No paraphrasing, no smoothing. If a reference said "kinda harsh" the report says "kinda harsh," not "somewhat harsh."
- **Contradictions surface inline.** A separate "contradictions" section at the end is harder to read than inline notes per dimension.
# Consent checklist for reference processing
The reference-check-summary skill requires a per-reference consent log as input. This file documents the schema, the warning-header rules, and the halt conditions.
## Per-reference consent record
For each reference, the consent log contains:
```json
{
"reference_id": "R1",
"candidate_authorized": true,
"recording_consent": true,
"notes_processing_consent": true,
"jurisdiction": "US-NY",
"recorded": true,
"consent_collected_at": "2026-04-28T14:00:00Z",
"consent_collected_by": "recruiter-email@firm.com"
}
```
### Field definitions
- `candidate_authorized` — the candidate told the recruiter "you can call this person." Without this, the reference call should not have happened. Halt if any reference's value is `false`.
- `recording_consent` — if the call was recorded, the reference consented to recording. The skill needs this only if `recorded: true`.
- `notes_processing_consent` — the reference was told that the notes from the call may be processed by AI to generate a structured report. This is the explicit consent for the skill's processing path under GDPR Art. 6 lawful-basis requirements.
- `jurisdiction` — the state or country the reference was physically in during the call. This determines recording-consent law.
- `recorded` — whether the call was recorded.
## Warning-header rules
If any reference's consent record is missing or has `unknown`/`null` values, the report's top-of-page warning header reads:
```
⚠️ CONSENT WARNING
The following references have incomplete consent records:
- R2: notes_processing_consent is unknown.
- R3: candidate_authorized is unknown.
Verify consent before sharing this report. The skill processed the
notes regardless of the gap; the warning surfaces the gap for the
recruiter to confirm with the candidate and reference.
```
The warning is informational. The skill continues to the report. The recruiter is responsible for either confirming the missing consent (and updating the log for next time) or omitting the affected reference from the shared report.
## Halt conditions
Halt processing for a reference (skip it, do not include in the report) when:
1. **`candidate_authorized: false`** — the reference call should not have happened. Including the reference in the report would compound the underlying consent failure. Surface to the recruiter as a gap to address.
2. **`recorded: true` AND `recording_consent: false` AND `jurisdiction` is in a two-party-consent jurisdiction.** Two-party-consent jurisdictions (CA, IL, FL, MD, MA, MI, MT, NH, PA, WA in the US, plus all EU countries under GDPR) make recording without consent illegal. Processing the recorded notes compounds the violation. The skill refuses to process the reference and surfaces the issue to the recruiter.
```
HALT: R2 was recorded in CA without consent. Recording is illegal
in CA without two-party consent. The skill will not process this
reference's notes. Either delete the recording and re-interview the
reference (with consent this time), or omit the reference from the
report.
```
3. **`notes_processing_consent: false`** — the reference explicitly declined to have notes processed by AI. The skill respects that. The reference's notes can still inform the recruiter's own write-up, but they are not run through the skill.
## Why this matters
GDPR Art. 6 requires a lawful basis for processing personal data. A reference's notes ARE personal data (the reference's, and the candidate's). The lawful basis for AI processing is most commonly explicit consent or legitimate interest with a balancing test. In either case, the reference must have been informed.
NYC LL 144 and the EU AI Act focus on the candidate side, but reference data falls in the same processing pipeline. A defensible recruiting AI posture handles consent on both sides.
The skill cannot enforce that the recruiter actually collected consent. What it can enforce is that the consent is logged before processing, and that missing or contradictory consent surfaces to the recruiter rather than getting buried.
## What goes in the consent log when you didn't collect consent properly
The honest answer: omit the reference from this skill's processing. Use your own write-up. The skill's auditability comes from the consent record being trustworthy; populating it with `unknown` to make the skill run defeats the purpose.
Update your reference-call intake script to collect the four fields above as part of the call opening. The marginal time cost is 30 seconds per call.