claude-skill

Auditor de slates de diversidad con Claude

Dificultad

avanzado

Tiempo de setup

45min

Para

recruiter · sourcer · talent-acquisition · dei-leader

Reclutamiento y TA

Stack

Una Claude Skill que audita un slate de candidatos (la lista de entrevistas que el recruiter pretende enviar, o el pool completo de sourcing, o el pool de aplicaciones) contra el pool de referencia del mercado laboral relevante para el rol, expone gaps de composición y emite un registro de auditoría estructurado — sin correr inferencia estadística sobre candidatos individuales y sin recomendar qué candidatos agregar o quitar. El output es soporte de decisión para el recruiter y el lead de DEI, no un sistema de decisión automatizado.

Cuándo usarla

Estás cortando un slate desde un pool de sourcing para enviarlo al hiring manager y quieres saber si la composición del slate refleja el pool de referencia del mercado laboral del rol antes de enviarlo.
Estás cerrando trimestre y necesitas una auditoría agregada por roles para la revisión del programa de DEI.
Estás preparando una presentación bajo NYC Local Law 144 (bias-audit) y necesitas una pre-revisión interna de la composición del slate antes de la auditoría independiente formal.

Cuándo NO usarla

Identificar la pertenencia de candidatos individuales a una clase protegida. La skill procesa únicamente datos demográficos agregados y auto-reportados. Se rehúsa a inferir demografía a partir del nombre, foto, universidad o cualquier señal a nivel candidato.
Auto-rechazar candidatos para “rebalancear” un slate. Rechazar a un candidato para alcanzar un número de composición es discriminación inversa y dispara la misma exposición legal que el desbalance original. La skill expone el gap; el fix está aguas arriba (canales de sourcing, query de búsqueda, lenguaje del JD), no en el paso de cortar el slate.
Datos de composición a los que el candidato no consintió. Los datos de auto-ID tienen su propio flujo de consentimiento bajo la autorización del candidato que captura el ATS de la empresa (Ashby, Greenhouse y Lever lo exponen). La skill procesa solo los datos que el candidato aceptó compartir, en agregado.
Slates de un solo rol con menos de 5 candidatos. Cuanto más pequeño el slate, menos significa la señal de la auditoría. La skill advierte por debajo de 5; se rehúsa a calcular estadísticas de composición por debajo de 3.

Setup

Coloca el bundle. Pon apps/web/public/artifacts/diversity-slate-auditor-skill/SKILL.md en tu directorio de skills de Claude Code.
Configura la fuente del pool de referencia. La skill necesita un pool de referencia para comparar — usualmente las estadísticas ocupacionales del BLS (gratis, públicas), aumentadas con datos específicos de la industria cuando estén disponibles. El selector de pool de referencia en references/1-reference-pools.md documenta qué tabla del BLS mapea a qué familia de roles.
Conecta la exportación del ATS. Tanto Ashby como Greenhouse exponen exportaciones de auto-ID vía sus APIs (Ashby /candidate.list con columnas de self-id; Greenhouse endpoint applications con campos EEOC). La skill lee la exportación; no llama al ATS directamente. Esta separación significa que la minimización de datos ocurre en el momento de la exportación y la skill nunca ve registros crudos de candidatos.
Define los guardrails de tamaño de slate. Por defecto: advertencia bajo 5, rechazo bajo 3. Ajústalo por familia de rol si los tamaños típicos de slate de tu equipo difieren.
Haz un dry-run sobre un slate cerrado. Audita el slate de un rol que cerraste el trimestre pasado. Compara el análisis de gaps de la skill con la lectura del mismo slate por parte de tu lead de DEI. La skill expone deltas de composición; si esos deltas importan es un juicio que la skill no hace.

Qué hace la skill realmente

Seis pasos. La skill está estructurada para mantener la inferencia a nivel agregado — nunca a nivel candidato — y para exponer gaps sin recomendar intervenciones, porque la intervención correcta varía según la fuente del gap y no es el paso de cortar el slate.

Carga el slate (los candidatos que pretendes entrevistar, o el pool de sourcing, o el pool de aplicaciones — según lo que el recruiter quiera auditar). La skill espera una exportación a nivel agregado: el self-ID por candidato se lee pero solo se usa para calcular agregados; no se emite ningún análisis por candidato.
Carga el pool de referencia para la familia de rol. Las estadísticas ocupacionales del BLS son el default; el mapeo de familia de rol a tabla del BLS vive en references/1-reference-pools.md. El recruiter puede sustituir pools de referencia específicos de la industria (p. ej. Stack Overflow Developer Survey para ingeniería de software).
Calcula deltas de composición a nivel slate vs. pool de referencia. Para cada dimensión demográfica sobre la que el slate tiene datos de self-ID (género, raza/etnia según las categorías EEOC, estatus de veterano, estatus de discapacidad — solo las dimensiones que la empresa recolecta), calcula el porcentaje del slate y el del pool de referencia. Calcula el delta absoluto.
Expone gaps por dimensión con una banda de confianza. Un delta de 5pp en un slate de 50 significa más que el mismo delta en un slate de 8. La banda de confianza refleja el tamaño del slate y la especificidad del pool de referencia.
Expone candidatos de gap aguas arriba. Para cada delta expuesto, lista 3-5 causas probables aguas arriba que el recruiter puede investigar — mix de canales de sourcing, lenguaje de la query de búsqueda (el Boolean search builder pre-vuelo de fairness atrapa algunas), lenguaje del JD, lenguaje del hiring manager en el screen. NO rankea ni recomienda; lista candidatos para que el recruiter y el lead de DEI los investiguen.
Emite registro de auditoría. Una línea JSONL firmada con composición del slate, pool de referencia usado, deltas calculados y la versión de la skill. Sin PII. El registro de auditoría es lo que hace defendible una presentación bajo NYC LL 144 o una revisión interna de DEI.

Realidad de costos

Por auditoría de slate, sobre Claude Sonnet 4.6:

Tokens del LLM — 5-10k de input (agregados del slate + tabla del pool de referencia + instrucciones de la skill) y 2-3k de output (análisis de gap por dimensión + candidatos aguas arriba). Aproximadamente $0.05-0.10 por auditoría.
Datos del pool de referencia — los datos del BLS son gratis. Stack Overflow Developer Survey es gratis. Los datasets específicos de industria varían; la ruta solo-BLS cuesta $0.
Tiempo de recruiter / lead de DEI — la ganancia. Las auditorías de composición usualmente se saltan porque son tediosas; la skill convierte la auditoría en el costo por defecto en vez de un paso extra. Espera 5-10 minutos por slate para leer la auditoría, más 20-40 minutos por trimestre para investigar los candidatos de gap aguas arriba expuestos.
Tiempo de setup — 45 minutos una sola vez para el mapeo del pool de referencia y la conexión de la exportación del ATS.

Métrica de éxito

Trackea tres cosas, mensualmente, no por slate:

Drift del delta de composición en el tiempo — ¿se cierra el gap slate-vs-pool-de-referencia en los roles trackeados? Si no, las intervenciones aguas arriba no están funcionando.
Shift del mix de canales de sourcing — cuando la auditoría expone un candidato de gap en canal de sourcing, ¿se mueve el mix de canales realmente al trimestre siguiente? Si sourcing sigue recomendando los mismos canales, la superficie aguas arriba de la auditoría no está llegando a sourcing.
Gap de auditoría NYC LL 144 / DEI interna — cuando ocurre la bias-audit anual formal, ¿coinciden sus hallazgos con lo que las auditorías slate-por-slate expusieron durante el año? Si la auditoría formal expone gaps que las auditorías de slate omitieron, el mapeo de pools de referencia o las dimensiones trackeadas están incompletos.

vs alternativas

vs dashboards de diversidad nativos del ATS (Greenhouse Inclusion, el reporting de diversidad de Ashby). Los dashboards nativos del ATS muestran composición; no calculan deltas contra pool de referencia ni exponen candidatos aguas arriba. Elige el nativo del ATS si solo necesitas reporting. Elige la skill si necesitas soporte de decisión por slate.
vs Crosschq Diversity / SeekOut DEI / la capa de diversidad de Eightfold. Esos son productos más profundos con sus propios pools de referencia y capas de análisis. Elígelos si el presupuesto soporta la jugada de plataforma y quieres un producto gestionado. Elige la skill si quieres la lógica de auditoría en tu repo, el mapeo de pool de referencia bajo tu control y el registro de auditoría portable.
vs estadísticas de composición calculadas a mano. Hacerlas a mano está bien para la revisión anual de DEI pero se rompe a cadencia de slate; nadie las calcula a mano por slate. La skill hace que la auditoría sea barata como para correrla en cada slate.
vs ninguna auditoría. El default, y la exposición legal bajo NYC LL 144 (bias-audit anual obligatoria para herramientas de IA usadas en hiring en NYC). La skill es la postura defendible más barata.

Cosas a vigilar

Discriminación inversa por “rebalancear”. Guard: la skill nunca recomienda agregar o quitar candidatos individuales. Ajustar un slate eliminando candidatos para alcanzar números de composición es discriminación inversa y crea la misma exposición legal que el desbalance original. La auditoría expone; el fix está aguas arriba.
Inferir demografía desde señales del candidato. Guard: la skill procesa solo datos de self-ID que el candidato consintió compartir. Se rehúsa a inferir raza/etnia desde el nombre, género desde pronombres, edad desde el año de graduación, o cualquier inferencia a nivel candidato. Los pools de referencia usados para comparar son estadísticas agregadas, no features a nivel candidato.
Ruido de slates pequeños. Guard: tamaños de slate por debajo de 5 producen un header de advertencia en la auditoría; por debajo de 3 la skill se rehúsa a calcular estadísticas de composición.
Pools de referencia desactualizados. Guard: el mapeo de pools de referencia en references/1-reference-pools.md lleva una fecha last_verified por fuente. Fuentes mayores a 18 meses disparan una advertencia para refrescar el mapeo.
Manipulación del audit trail. Guard: los registros de auditoría son JSONL append-only con la versión de la skill embebida. Modificar rompe la cadena de firmado del archivo. La retención rutinaria del registro de auditoría debe ser al menos tan larga como la retención de registros de hiring de la empresa (típicamente 2-7 años).
Riesgo de exfiltración de datos DEI. Guard: el registro de auditoría contiene agregados y deltas, no campos por candidato. La skill se rehúsa a escribir datos de self-ID por candidato dentro del registro de auditoría.

Stack

El bundle de la skill vive en apps/web/public/artifacts/diversity-slate-auditor-skill/ y contiene:

SKILL.md — la definición de la skill
references/1-reference-pools.md — el mapeo de familia-de-rol a pool-de-referencia (BLS, Stack Overflow Developer Survey, etc.)
references/2-audit-record-format.md — el formato de output literal para el registro JSONL de auditoría

Herramientas que el workflow asume que usas: Claude (el modelo), Ashby o Greenhouse (el ATS, para la exportación de self-ID). Para la auditoría paralela de canales de sourcing, ver el Boolean search builder — su pre-vuelo de fairness atrapa algunas causas de gap aguas arriba.

Conceptos relacionados: diversity recruiting, AI screening bias, structured interviewing.

Editar esta página en GitHub

Archivos de este artefacto

Descargar todo (.zip)

---
name: diversity-slate-auditor
description: Audit a candidate slate's composition against a reference labor-market pool, surface per-dimension gaps with confidence bands, list upstream gap candidates for the recruiter to investigate, and emit an audit record. Never makes per-candidate inferences; never recommends adding or removing individual candidates from a slate.
---

# Diversity slate auditor

## When to invoke

Use this skill when a recruiter or DEI lead has a candidate slate (interview lineup, sourced pool, application pool) and wants the slate's composition audited against the role's reference labor-market pool. Take an aggregate-level slate export plus a reference-pool mapping as input and return a structured audit report plus an append-only JSONL audit record.

Do NOT invoke this skill for:

- **Identifying individual candidates' protected-class membership.** This skill processes self-reported aggregate data only. It refuses to infer demographics from name, photo, school, or any candidate-level signal.
- **Auto-rejecting candidates to "rebalance" a slate.** The skill surfaces gaps; it never recommends adding or dropping individual candidates. Rebalancing by candidate-level removal is reverse discrimination.
- **Composition data candidates have not consented to share.** Self-ID flows in Ashby/Greenhouse/Lever capture explicit consent. The skill processes only consented data.
- **Slates of <3 candidates.** Composition statistics are not meaningful at that size.

## Inputs

- Required: `slate_export` — path to a per-role aggregate export from the ATS. The export should contain self-ID counts per dimension at the slate level, NOT per-candidate rows. Example: `{ "gender": {"woman": 4, "man": 7, "non_binary": 1, "decline_to_state": 2}, "race_ethnicity": {...}, ... }`. If the export is per-candidate, the skill aggregates first and discards the per-row data before any analysis.
- Required: `role_family` — string identifying the role (e.g. `senior-software-engineer`, `account-executive`). Used to look up the reference pool in `references/1-reference-pools.md`.
- Optional: `reference_pool_override` — path to a custom reference-pool file (e.g. industry-specific data). If absent, defaults to BLS for the mapped occupation.
- Optional: `slate_label` — free-text label for the audit record (e.g. `Q2-2026-senior-eng-onsite-slate`).

## Reference files

- `references/1-reference-pools.md` — role-family-to-reference-pool mapping with sources, dates, and the BLS occupation codes.
- `references/2-audit-record-format.md` — the literal JSONL schema for the audit record.

## Method

Six steps.

### 1. Load the slate

Open `slate_export`. If the export is per-candidate, aggregate immediately and discard the per-row data — DO NOT pass per-candidate self-ID through any subsequent step.

If the slate has <3 candidates, halt: "Slate too small for audit. Composition statistics on <3 candidates are not meaningful and risk identifying individuals."

If the slate has 3-4 candidates, emit a warning header on the audit but continue: "Small slate — composition deltas have wide confidence bands."

### 2. Load the reference pool

Read `references/1-reference-pools.md` and map `role_family` to the appropriate BLS occupation code (or other source). Load the reference pool's per-dimension percentages.

If the reference pool's `last_verified` date is older than 18 months, emit a freshness warning on the audit. Continue.

If `reference_pool_override` is provided, use that file instead and skip the BLS mapping.

### 3. Compute composition deltas

For each dimension where both the slate AND the reference pool have data:

- Slate percentage = slate_count / slate_total
- Reference percentage = reference value
- Delta = slate_pct - reference_pct (signed; negative = under-representation in slate)

Round to 1 decimal place. Do NOT compute statistical-significance scores at the per-dimension level — slate sizes are too small for the inferential framing to mean anything.

### 4. Surface gaps with confidence bands

For each dimension with `|delta| >= 5pp`, emit a "gap" entry with:

- Direction (under or over)
- Magnitude (in percentage points)
- Confidence band based on slate size:
  - `n >= 30` → `medium-high` confidence
  - `10 <= n < 30` → `medium` confidence
  - `5 <= n < 10` → `low` confidence
  - `3 <= n < 5` → `informational only`

Do NOT label gaps as "concerning" or "fine." That judgment is the DEI lead's, not the skill's.

### 5. Surface upstream gap candidates

For each dimension with a gap, list 3-5 likely upstream causes the recruiter and DEI lead can investigate:

- **Sourcing channel mix** — which channels did the slate come from? Channels have their own composition skews; LinkedIn surfaces differently than Stack Overflow Jobs differently than employee referrals.
- **Search query language** — does the [Boolean search builder](/en/workflows/boolean-search-builder-claude-skill/) fairness pre-flight surface anything when run against the role intake?
- **JD language** — masculine-coded language ("rockstar," "ninja," "competitive") has measurable effect on application-stage composition. The JD audit is a separate workflow.
- **Hiring-manager screen language** — what questions did the screen include? Did any function as a proxy filter?
- **Application drop-off** — at which stage did the under-represented group drop off most? If at sourcing, the channel mix is the likely cause; if at screen, the screen rubric is.

DO NOT rank these. The right intervention varies by gap source. Listing them is decision support.

### 6. Emit audit record

Append one JSONL line to `audit/<YYYY-MM>.jsonl` matching the schema in `references/2-audit-record-format.md`. The record contains:

- `audit_id` (uuid), `timestamp`, `slate_label`, `role_family`
- `slate_size`, `dimensions_audited`, per-dimension `slate_pct` / `reference_pct` / `delta` / `confidence`
- `reference_pool_source`, `reference_pool_last_verified`
- `skill_version`, `model`

NO PII. NO per-candidate fields. The audit record is what makes a NYC LL 144 submission or annual DEI review defensible; it must be immune to candidate re-identification.

## Output format

```markdown
# Slate audit — {slate_label}

Audited: {ISO timestamp} · Role family: {role_family} · Slate size: {n}

{SMALL-SLATE WARNING if 3-4 candidates}
{REFERENCE-POOL FRESHNESS WARNING if >18 months old}

## Reference pool

- Source: {BLS table / Stack Overflow Developer Survey 2024 / etc.}
- Last verified: {date}

## Composition deltas

| Dimension | Slate % | Reference % | Delta | Confidence |
|---|---|---|---|---|
| Gender — woman | 28.6% | 21.8% | +6.8pp | medium |
| Gender — man | 50.0% | 76.5% | -26.5pp | medium |
| Race — Asian | 35.7% | 19.3% | +16.4pp | medium |
| Race — Black | 0.0% | 8.5% | -8.5pp | medium |
| Race — Hispanic/Latino | 7.1% | 7.6% | -0.5pp | medium |
...

## Gaps surfaced (|delta| >= 5pp)

### Race — Black: under-represented by 8.5pp (medium confidence)

Upstream gap candidates to investigate:
- Sourcing channel mix — what share of the slate came from referral vs. inbound vs. cold sourcing? Referral pools tend to mirror existing team composition.
- Search query language — run the role intake through the Boolean search builder's fairness pre-flight.
- Application drop-off — at which funnel stage is the gap widest?
- Outreach response rate — does outreach response by demographic show the gap originating in candidate engagement vs. sourcing reach?
- JD language — does the JD use language that has measured composition impact on application stage?

### Race — Asian: over-represented by 16.4pp (medium confidence)
{same shape}

## Audit record

Appended to `audit/2026-05.jsonl` — record id `{uuid}`.
```

## Watch-outs

- **Reverse discrimination from "rebalancing."** *Guard:* skill never recommends per-candidate adds/removes. Output is composition deltas + upstream gap candidates only.
- **Per-candidate inference.** *Guard:* skill processes aggregate data only; per-candidate exports are aggregated and discarded immediately on load.
- **Small-slate noise.** *Guard:* refuses at <3, warns at 3-9, low-confidence at <10.
- **Stale reference pools.** *Guard:* freshness warning at >18 months on the source.
- **Audit-record retention.** *Guard:* records are append-only JSONL with skill version embedded. Recruiters / DEI leads handle retention per firm hiring-record policy (typically 2-7 years).

# Reference-pool mapping

The diversity slate auditor compares slate composition to a reference labor-market pool. This file maps each role family to the appropriate reference source.

The defaults are BLS Occupational Employment Statistics (free, US-only, updated annually). Industry-specific overrides are listed where stronger sources exist.

## Format

Each entry has:

- `role_family` — the string the recruiter passes to the skill
- `bls_occupation_code` — the BLS SOC (Standard Occupational Classification) code
- `bls_table_url` — the canonical BLS table URL for the occupation's demographic breakdown
- `last_verified` — when this entry was confirmed against the BLS source
- `recommended_override` — a stronger source where one exists
- `notes` — caveats specific to this role family

## Mappings

### Software engineering

```yaml
role_family: senior-software-engineer
bls_occupation_code: "15-1252" # Software Developers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: stack-overflow-developer-survey
notes: |
BLS lumps all software developer levels together. For senior+ roles,
the Stack Overflow Developer Survey breaks down by years of experience
and tends to surface a different demographic mix at 10+ years vs. all
developers. For roles requiring 8+ years experience, the SO override
is more representative.
```

```yaml
role_family: junior-software-engineer
bls_occupation_code: "15-1252"
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Junior roles draw heavily from CS programs. The CRA Taulbee Survey
has CS-bachelor's demographics that may be a better fit for new-grad
hiring slates.
```

```yaml
role_family: engineering-manager
bls_occupation_code: "11-9041" # Architectural and Engineering Managers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Management roles have substantially different demographic distributions
from IC roles. Use this code (not the IC code) for EM/Director slates.
```

### Sales

```yaml
role_family: account-executive
bls_occupation_code: "41-3091" # Sales Representatives, Services
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Tech-AE roles and SaaS-AE roles tend to have different demographics
from the broader services-sales population the BLS code covers.
Industry-specific data is hard to come by; treat the BLS reference
as a floor.
```

```yaml
role_family: sales-development
bls_occupation_code: "41-3091"
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
SDR roles are entry-level; the BLS code includes career sales reps,
which skews older. Adjust expectations for early-career composition.
```

### Customer success

```yaml
role_family: customer-success-manager
bls_occupation_code: "13-1151" # Training and Development Specialists
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
No clean BLS code for CSM. The training-and-development code is the
closest occupational analog by job content; the customer-service-rep
code is too entry-level. Treat with caveat.
```

### Recruiting / HR

```yaml
role_family: recruiter
bls_occupation_code: "13-1071" # Human Resources Specialists
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: null
```

### Marketing

```yaml
role_family: marketing-manager
bls_occupation_code: "11-2021" # Marketing Managers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: null
```

### Data / analytics

```yaml
role_family: data-scientist
bls_occupation_code: "15-2051" # Data Scientists
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Data scientist is a relatively new BLS code (added 2021). The
demographic data is thinner than for established occupations.
```

```yaml
role_family: data-analyst
bls_occupation_code: "15-2098" # Data Analysts
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: null
```

### Legal

```yaml
role_family: in-house-counsel
bls_occupation_code: "23-1011" # Lawyers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: aba-profile-of-the-legal-profession
notes: |
ABA's annual Profile of the Legal Profession has more granular
partnership/in-house/government breakdowns than BLS. For in-house
roles specifically, the ABA override is more representative.
```

## Adding a role family

To add a new role family:

1. Find the BLS SOC code that best matches the role's actual job content (not the marketing title).
2. Confirm the BLS demographic table for that occupation has the dimensions you need.
3. Add the entry to this file with `last_verified` set to today.
4. If a stronger industry-specific source exists (industry survey, professional association data), note it under `recommended_override`.

## Refresh cadence

BLS publishes Current Population Survey demographic tables annually. This file should be re-verified every 12 months. Sources older than 18 months trigger a freshness warning in the auditor's output.

# Audit-record JSONL schema

The diversity slate auditor appends one JSONL line per audit to `audit/<YYYY-MM>.jsonl`. This file documents the schema. The format is fixed because external readers (NYC LL 144 audit submission, internal DEI program review, legal discovery) need to parse the records reliably.

## Schema

```json
{
  "audit_id": "uuid-v4",
  "timestamp": "ISO-8601 UTC",
  "skill_version": "1.0",
  "model": "claude-sonnet-4-6",
  "slate_label": "free-text identifier",
  "role_family": "string from references/1-reference-pools.md",
  "slate_size": "integer",
  "slate_size_warning": "ok | small_slate_warning | informational_only",
  "reference_pool": {
    "source": "BLS-15-1252 | stack-overflow-developer-survey-2024 | ...",
    "last_verified": "ISO-8601 date",
    "freshness_warning": "ok | over_18_months"
  },
  "dimensions": [
    {
      "dimension": "gender",
      "category": "woman",
      "slate_pct": 28.6,
      "reference_pct": 21.8,
      "delta_pp": 6.8,
      "confidence": "low | medium | medium-high"
    },
    {
      "dimension": "race_ethnicity",
      "category": "Black",
      "slate_pct": 0.0,
      "reference_pct": 8.5,
      "delta_pp": -8.5,
      "confidence": "low | medium | medium-high"
    }
  ],
  "gaps_surfaced": [
    {
      "dimension": "race_ethnicity",
      "category": "Black",
      "direction": "under",
      "magnitude_pp": 8.5,
      "confidence": "medium",
      "upstream_candidates": [
        "sourcing-channel-mix",
        "search-query-language",
        "application-drop-off",
        "outreach-response-rate",
        "jd-language"
      ]
    }
  ]
}
```

## Field-by-field

- `audit_id` — uuid v4. Stable for the audit's lifetime; allows downstream systems to deduplicate.
- `timestamp` — ISO-8601 UTC of when the audit was generated, NOT when the slate was assembled.
- `skill_version` — version of this skill (semver). Allows downstream readers to handle schema evolution.
- `model` — exact model ID used (e.g. `claude-sonnet-4-6`). Required for NYC LL 144 reproducibility — the audit must identify the model that processed the data.
- `slate_label` — free-text label. Recruiter chooses; suggested format `<quarter>-<role-family>-<stage>` (e.g. `Q2-2026-senior-eng-onsite-slate`).
- `role_family` — must match a key in `references/1-reference-pools.md`. Required for the reference-pool validation chain.
- `slate_size` — integer count of the slate.
- `slate_size_warning` — `ok` if `n >= 5`, `small_slate_warning` if `3 <= n < 5`, `informational_only` if `n < 3`. The audit refuses to compute deltas at `n < 3` (the auditor halts at load-time before any record is written).
- `reference_pool` — object. `source` is the named source string. `last_verified` is when the role-to-pool mapping was last confirmed against the source. `freshness_warning` is `over_18_months` if the source's `last_verified` is older than 18 months.
- `dimensions` — array of per-dimension/category records. Every dimension/category pair the slate has data for AND the reference pool has data for. Pairs missing from either side are silently skipped (the audit does not assert about dimensions it cannot compare).
- `gaps_surfaced` — array of dimensions with `|delta_pp| >= 5`. Empty array if no gaps cross the threshold. Each gap entry includes the upstream-candidate keys for the recruiter / DEI lead to investigate; the upstream candidates are NOT recommendations but a list of investigation surfaces.

## What the schema deliberately does NOT include

- **Per-candidate fields.** No candidate IDs, no per-candidate self-ID, no per-candidate scores. The skill's design point is aggregate-only inference; the audit record reflects that.
- **Statistical-significance scores.** Slate sizes are too small for inferential framing to mean anything, and surfacing a p-value invites the wrong kind of reading. The confidence band (`low | medium | medium-high`) is a coarser, more honest summary.
- **Recommendations.** The skill surfaces gaps and lists upstream candidates. It does not say "you should hire more X" or "the slate is unbalanced" — those judgments are the DEI lead's, and the skill's role is decision support, not decision automation.
- **Identifying information about the recruiter or DEI lead.** The audit record is about the slate, not about who ran the audit. Operator identity belongs in the audit log of the system that called the skill (your ATS, your scheduling tool), not in the skill's own record.

## Retention

The audit records should be retained for at least as long as the firm retains hiring records — typically 2-7 years for affirmative-action-program firms (under 41 CFR 60-1.12), longer in some EU jurisdictions. NYC LL 144 requires the bias-audit results be made publicly available; the per-slate audit records support the annual aggregation that goes public.

The skill writes append-only JSONL with the skill version embedded. Modification breaks readability of the file; prefer correction-via-superseding-record (write a new audit with `slate_label` referencing the original) over editing.

## Reading the records

Downstream readers (the firm's annual DEI report, the NYC LL 144 submission, an external auditor) parse the JSONL by line. The schema is forward-compatible: new optional fields can be added in future skill versions; consumers that don't recognize new fields ignore them.

For the annual aggregation, group by `role_family` and quarter, then for each `(role_family, quarter)` compute:

- Mean delta per dimension/category over all slates
- Total gaps surfaced and per-gap counts
- Trend in delta over the past four quarters

That aggregation lives outside this skill — it's a separate report. The audit records exist so that aggregation is possible.