claude-skill

Structured interview loop builder with Claude

Difficulty

intermediate

Setup time

30min

For

recruiter · hiring-manager · talent-acquisition

Recruiting & TA

Stack

A Claude Skill that takes a job description, the role’s level, the must-have competencies, and the eligible interviewer pool with each interviewer’s calibrated strengths, and produces a complete loop design — stage progression, per-stage rubric with anchor descriptions, behavioral questions per dimension, and an interviewer-assignment table with the rationale for each pick. Then stops at a hiring-manager review gate before anything is configured in the ATS. Replaces “we’ll figure out the loop when a candidate is in screen” with a 30-minute design pass that produces operational discipline.

When to use

You have an approved JD, a confirmed level, and a list of must-have competencies that differentiate hire from no-hire on this role.
You have a structured interviewing rubric library with anchor descriptions per score level per level band. The competency template at apps/web/public/artifacts/interview-loop-builder-skill/references/1-competency-library.md shows the shape; if you cannot fill it in, you do not yet have a rubric this skill can pull from.
You have an interviewer pool with calibrated strengths recorded per competency per level band — see references/2-interviewer-strengths.md in the bundle for the matrix.
A hiring manager will review the loop before it is configured in Ashby or Greenhouse. The skill writes files and stops; it does not push to the ATS.

When NOT to use

Auto-scheduling. This skill designs the loop. It does not schedule interviews, match calendars, or send candidate-facing booking links. That is Goodtime, Ashby Scheduling, or Greenhouse Scheduling. Coupling design and scheduling in one skill couples two failure modes that should fail independently.
Replacing the rubric design with the hiring manager. The skill emits anchor descriptions per score level by pulling from the competency library, but the library itself — what a 5 looks like for systems-design at IC5 — is owned by the hiring manager and the head of function. If the library is empty or all-template, the skill refuses and surfaces a TODO rather than inventing rubric anchors for a function it has no calibrated signal on.
Generic templated loops without role-specific calibration. If the inputs do not name the level, the must-have competencies, or the eligible interviewer pool, the skill refuses. A four-stage loop with generic “behavioral”, “technical”, “system design”, “leadership” labels reads as structured but is not. Every candidate gets the same questions regardless of role priorities, which defeats the point of structure.
Roles below a defined complexity threshold. A two-week contractor role does not need a four-stage loop. The skill warns and suggests a one-stage screen if the role is contract, hourly, or under 6 months expected tenure.
Replacing behavioral interviewing training. The questions emitted by the skill follow the situation/behavior/outcome shape, but interviewers still need trained calibration to score consistently. The skill is the scaffold; the training is the prerequisite.

Setup

Drop the bundle. Place apps/web/public/artifacts/interview-loop-builder-skill/SKILL.md into your Claude Code skills directory (or claude.ai custom Skills). The skill exposes one callable function: design_loop.
Fill in the competency library. Copy references/1-competency-library.md to your team repo. Replace every placeholder with your real competencies, definitions, level bands covered, and anchor descriptions per score level. The skill refuses to run if the library is template-only.
Fill in the interviewer-strengths matrix. Copy references/2-interviewer-strengths.md. List each eligible interviewer, their team, and the level bands they are calibrated to score each competency on. The “Last interview” column is the trigger to re-calibrate at 6 months idle.
Configure inputs per role. For a given role, pass the JD path, the level, an array of 3-6 competency IDs, and a path to the filled interviewer-strengths matrix. The skill emits loop.md and per-stage scorecard scaffolds under scorecards/.
Dry-run on a closed loop. Run on a role you designed manually in the last quarter. Compare the skill’s stage mapping and interviewer assignments to the manual design. If they diverge, the competency library or interviewer matrix is usually the thing that needs tuning, not the skill prompt.

What the skill actually does

Six steps, in order. The order matters: deterministic validation and mapping happen before the LLM generates rubric anchors and questions, and the candidate-experience pass at the end re-reads the assembled loop to catch overload that is invisible while assigning each stage in isolation.

Validate inputs. Each competency ID exists in the library; the interviewer pool has at least one calibrated person per must-have competency at the role’s level; the level falls inside the library’s covered bands. Halt with explicit TODOs if any check fails. Designing a Director loop with an IC-only library produces inflated rubrics — this is the step that prevents it.
Map competencies to stages. Recruiter screen evaluates fit and basics (never on-rubric). HM screen takes the top 1-2 differentiating competencies. On-site loop spreads the rest one-per-interview where possible. The one-competency-per-interview rule is opinionated — bundling two competencies into a 60-minute interview produces shallower signal on both, and it makes the rubric harder to apply in the moment.
Generate the per-stage rubric. For each post-screen stage, pull the anchor descriptions for the candidate’s level band from the competency library. Generate 3-5 behavioral questions per dimension following the situation/behavior/outcome shape, plus one suggested probing follow-up per question. Hypothetical “what would you do” questions are excluded by default — they reward articulate guessing over evidenced experience.
Assign interviewers with rationale. For each post-screen stage, propose 1-3 interviewers from the pool. Match by calibration fit (hard requirement), load (no interviewer in more than one stage of the same loop), and diversity of perspective (at least one interviewer outside the hiring team where the pool allows). Each assignment ships with an explicit rationale string.
Candidate-experience pass. Re-read the assembled loop. Total active interview time over 5 hours for IC or 7 for leadership → flag and suggest a take-home. More than 6 distinct interviewers → flag loop fatigue. Two stages probing the same competency → flag redundant signal. Cross-timezone stages without an accommodation → surface a TODO.
Hiring-manager review gate. Write loop.md and scorecards/<NN>-<stage-id>.md. Stop. The skill defines no “publish to ATS” action. The HM opens the file, edits, and configures the loop in Ashby or Greenhouse themselves.

The literal output format and scorecard scaffold layout live in references/3-loop-output-format.md in the bundle. The format is fixed because downstream consumers — interviewer reading the scorecard, debrief facilitator collating scores — need predictable structure.

Cost reality

Per loop design, on Claude Sonnet 4.5:

LLM tokens — typically 30-60k input tokens (JD plus competency library plus interviewer matrix plus skill instructions) and 10-20k output tokens (loop plus 3-5 scorecard scaffolds with anchors and questions). On Sonnet 4.5 that is roughly $0.20-0.40 per loop design. A function hiring 8 roles a quarter spends under $5 in model cost on this skill.
Recruiter and hiring-manager time — the win lives here. A manual loop design from scratch with calibrated rubric pulls is 90-120 minutes of HM plus recruiter time on the design call, another 30-60 minutes documenting questions and assignments. The skill compresses that to a 30-minute review pass on the generated loop. Per role, that is roughly 90 minutes of senior IC or manager time saved.
Setup time — 30 minutes per role once the competency library and interviewer matrix are filled in. The library and matrix are the prerequisite — net-new, those take a calibration session per competency band, which is a structured interviewing investment, not this skill’s setup.
Compounding benefit — structured loops produce better quality of hire than ad-hoc loops on every published study going back twenty years. The skill’s win is making “structured” the default rather than the exception by removing the per-role design overhead.

Success metric

Track three numbers per role per quarter, in the ATS:

Loop design lead time — hours from “role approved” to “loop configured in ATS”. Should drop materially after the skill is in the loop. If it does not, the bottleneck is HM review, not design — surface the loop earlier in the role-kickoff sequence.
Inter-rater agreement on the rubric — per competency dimension, how often do interviewers’ independent scores fall within one point. Should hit 80% or above on calibrated competencies. Below that, the anchor descriptions in the competency library are the thing to tune, not the skill.
Quality of hire at 12 months — the long-arc metric the loop is meant to move. Compare cohorts hired through skill-designed loops vs ad-hoc loops on the same role family. If the skill-designed cohort does not outperform, re-examine the competency-to-stage mapping rather than abandoning structure.

vs alternatives

vs Ashby’s structured interviewing templates — Ashby owns the configured loop, scorecard rendering, and debrief in one product. Pick Ashby’s templates if you want a managed UX and your team will live in the ATS. Pick this skill if you want the rubric anchors, interviewer-strength matrix, and competency-to-stage mapping in your own repo, version-controlled, with the design step swappable as the competency library evolves. The skill’s output is the input to Ashby’s loop configuration, not a replacement for it.
vs generic templated loops — every ATS ships default four-stage templates (“phone screen, HM screen, technical, on-site panel”). They pass for structured at first glance but are not. The same template gets applied to a Backend IC4 and a CS Manager M2, with the same generic questions, regardless of which competencies actually differentiate hire from no-hire on each role. The skill earns its 30 minutes of setup on the second role because the design is per-role-calibrated rather than one-size-fits-all.
vs hiring-manager DIY loop design — a senior HM can design a good loop from scratch in 90-120 minutes. They tend not to, because under deadline pressure they reuse the last loop they ran, regardless of fit. The skill’s win is not “designs better than an experienced HM at peak”; it is “designs as well as an experienced HM consistently, across all roles and all weeks”. The consistency is the compounding benefit.
vs no structured loop at all — the published meta-analyses on structured interviewing put structured interviews at roughly twice the predictive validity of unstructured ones for job performance. If your status quo is unstructured, the skill is not the question — adopting structure is. The skill is how to make structure cheap enough to actually ship on every role.

Watch-outs

Interviewer overload from the same person being assigned everywhere. Guard: the assignment step in the skill enforces “no interviewer in more than one stage of the same loop” as a hard rule. The assignment table surfaces a backup interviewer per stage so the recruiter has a fallback when the primary is unavailable, rather than re-using the primary in two stages.
Redundant signal across stages. Guard: the candidate-experience pass re-reads the assembled loop and flags any competency probed in more than one stage. The competency-to-stage mapping table at the top of the loop output makes redundancy visible to the hiring manager at review time.
Candidate experience neglected. Guard: the candidate-experience pass is a separate, named step in the skill rather than a sentence at the bottom of the loop. It enforces total-time caps (5 hours IC, 7 hours leadership), distinct- interviewer caps (6), take-home suggestions for competencies that bloat the loop, and timezone accommodation TODOs. Without that pass, “one more 30-minute conversation” accumulates invisibly.
Calibration drift inside a single loop. Guard: the rubric block emitted per stage includes anchor descriptions per score level pulled from the competency library, not free-text “rate 1 to 5”. Anchors are the thing that holds calibration when the same candidate is scored by four different interviewers in the same loop. Vague rubric → vague scores → debrief that re-litigates every dimension by anecdote.
Hiring manager rubber-stamps the design. Guard: the skill stops at the review gate and writes to files. There is no “publish to ATS” action. The HM has to open the file and edit it before configuring the loop — that friction is intentional. If HMs start signing off without reading, the loop content drifts away from role priorities and the skill stops earning its time saved.
Stale interviewer calibration. Guard: the interviewer matrix has a “Last interview” column. Cells aged over 6 months trigger re-calibration before the interviewer is assigned again. When interview intelligence reveals questions that are not producing useful signal, update the competency library’s anchors and the skill emits the new anchors on the next run.

Stack

The skill bundle lives at apps/web/public/artifacts/interview-loop-builder-skill/ and contains:

SKILL.md — the skill definition with frontmatter, when-to-invoke rules, inputs, method, and watch-outs paired with guards
references/1-competency-library.md — the competency taxonomy with anchor descriptions per score level per level band; fill in per function before running
references/2-interviewer-strengths.md — the eligible interviewer pool matrix with calibrated coverage per competency; fill in per team before running
references/3-loop-output-format.md — the literal Markdown format the skill emits, including scorecard scaffold layout

Tools the workflow assumes you already use: Claude (the model), Ashby or Greenhouse (the ATS where the HM configures the designed loop), BrightHire or Metaview (interview intelligence whose signal feeds back into the competency library’s anchor tuning). Pairs directly with the JD writer upstream and the interview debrief summary downstream.

Edit this page on GitHub

Files in this artifact

Download all (.zip)

---
name: interview-loop-builder
description: Take a job description, level, must-have competencies, and an interviewer pool with strengths, and produce a complete interview loop design — stage progression, per-stage rubric, behavioral questions per dimension, and interviewer assignments paired with the rubric dimension each one is scoring. Stops at a hiring-manager review gate before the loop is configured in the ATS.
---

# Interview loop builder

## When to invoke

Use this skill when a recruiter or hiring manager has a confirmed opening, an approved JD, and needs the structured interview loop designed before the first candidate moves into the screen stage. Take the JD, the role's level, the must-have competencies, and the eligible interviewer pool with each interviewer's calibrated strengths, and produce a Markdown loop design plus per-stage scorecard scaffolds.

Do NOT invoke this skill for:

- **Auto-scheduling.** This skill designs the loop; it does not schedule. Calendar coordination, interviewer-availability matching, and candidate-facing booking links are Ashby Scheduling, Greenhouse Scheduling, or Goodtime's job. Mixing design and scheduling in one skill couples two failure modes that should fail independently.
- **Replacing the rubric design with the hiring manager.** The skill maps competencies to a rubric *dimension* and writes anchor descriptions per score level, but the actual rubric per role family is owned by the hiring manager and the head of the function. If `references/1-competency-library.md` is empty or all-template, the skill refuses and surfaces a TODO rather than inventing a rubric for a function it has no calibrated signal on.
- **Generic templated loops without role-specific calibration.** If the inputs do not name the level, the must-have competencies, or the interviewer pool, the skill refuses. A four-stage loop with generic "behavioral", "technical", "system design", "leadership" labels passes for structured but is not — every candidate gets the same questions regardless of role priorities, which defeats the point.
- **Roles below a defined complexity threshold.** A two-week contractor role does not need a four-stage loop. The skill warns and suggests a one-stage screen if the role is contract, hourly, or under 6 months expected tenure.

## Inputs

- Required: `job_description` — path to a Markdown file (typically the output of the JD-writer skill, or a manually authored JD).
- Required: `level` — one of `IC1`, `IC2`, `IC3`, `IC4`, `IC5`, `IC6`, `M1`, `M2`, `M3`, `Director`, `VP`. Drives loop length and the competency-to-stage mapping.
- Required: `must_have_competencies` — array of 3-6 competency IDs drawn from `references/1-competency-library.md`. The skill maps each to a stage and refuses if more than 6 are provided (loop length blows out, signal per interview drops).
- Required: `interviewer_pool` — path to `references/2-interviewer-strengths.md` filled in for the role's function, listing each eligible interviewer with their calibrated strengths (which competencies they have been calibrated to score on what level bands).
- Optional: `loop_length_max` — hard cap on number of post-screen stages, default 4. Above 5, the skill warns about candidate-experience cost.
- Optional: `time_zone` — interviewer/candidate timezone hint used in the candidate-experience pass to flag any cross-timezone stages that need an explicit accommodation.

## Reference files

Always read these from `references/` before generating the loop. They contain the user's competency taxonomy, the interviewer pool, and the output format. Without them, the loop is a template, not a design.

- `references/1-competency-library.md` — the team's calibrated competency taxonomy with rubric dimensions and behavioral-anchor descriptions per score level. Replace the template with your real library before running.
- `references/2-interviewer-strengths.md` — the eligible interviewer pool with each interviewer's calibrated strengths. The skill matches competencies to interviewers using this matrix.
- `references/3-loop-output-format.md` — the literal Markdown format the skill emits, including the per-stage rubric block, the interviewer-assignment table with rationale, and the candidate-experience summary.

## Method

Run these six steps in order. Do not parallelize — later steps depend on context from earlier steps, and the candidate-experience pass at the end deliberately re-reads the assembled loop to catch overload that is invisible while assigning each stage in isolation.

### 1. Validate inputs

Open `job_description`, `references/1-competency-library.md`, and `references/2-interviewer-strengths.md`. Confirm:

- Each ID in `must_have_competencies` exists in the competency library. Unknown IDs → stop and surface them.
- The interviewer pool contains at least one interviewer calibrated on each must-have competency at the role's level. If a competency has no calibrated interviewer, surface a TODO ("hire / calibrate an interviewer for {competency} at {level} before designing this loop") and stop.
- The level falls inside the range the competency library has anchor descriptions for. Designing a Director loop with an IC-only library produces inflated rubrics; refuse.

### 2. Map competencies to stages

For each must-have competency, decide the stage where it is best evaluated. The mapping is opinionated:

- **Recruiter screen** evaluates fit, interest, comp alignment, and basic must-have-skill confirmation. Never a competency dimension on the rubric — recruiters do not score the rubric.
- **Hiring-manager screen** evaluates the top 1-2 competencies that most differentiate hire / no-hire on this role. The HM is the highest-signal interviewer; spending HM time on lower-priority competencies wastes calibration.
- **On-site loop** spreads remaining competencies one-per-interview where possible. One competency per interview is the engineering choice — bundling two competencies into one 60-minute interview produces shallower signal on both, and it makes the rubric harder for the interviewer to apply in the moment.
- **Working-session / take-home** (optional, only for IC4+ technical roles) evaluates competencies that need extended time or written artefact (system design, written communication, code review depth).

The output of this step is a `competency → stage` table with the rationale for each placement.

### 3. Design the per-stage rubric

For each post-screen stage, generate the rubric block. Each rubric dimension corresponds to one competency from step 2. For each dimension:

- Pull the anchor descriptions from `references/1-competency-library.md` for the candidate's level band.
- Generate 3-5 behavioral questions that probe the dimension. Each question must reference the situation / behavior / outcome shape; hypothetical "what would you do if…" questions are excluded by default because they reward articulate guessing over evidenced experience.
- Generate one suggested probing follow-up per question for the interviewer to use when the candidate's first answer is shallow.

The choice to include anchors and follow-ups in the output (rather than just the questions) is what separates a usable scorecard from "here are some questions, score it from 1 to 5". Without anchors, calibration drifts within a single loop.

### 4. Assign interviewers with rationale

For each post-screen stage, propose 1-3 interviewer candidates from the eligible pool. Match by:

- Calibration fit — the interviewer is calibrated on this competency at this level band. Hard requirement.
- Load balance — no interviewer is assigned to more than one stage in the same loop. Prevents the "only one person actually evaluates this candidate" failure mode where a single interviewer dominates the debrief.
- Diversity of perspective — where the eligible pool allows, propose at least one interviewer from outside the hiring team to reduce consensus bias in the debrief.

Output: an assignment table with the rationale per assignment ("Jamie calibrated at IC4 on systems-design; Priya outside the hiring team, last interviewed at this level 6 weeks ago").

### 5. Candidate-experience pass

Re-read the assembled loop and check, in order:

- Total interview time across all stages. Above 5 hours active for an IC role or 7 hours for a leadership role, flag and suggest moving one competency to a take-home.
- Number of distinct interviewers the candidate meets. Above 6, flag ("loop fatigue").
- Stages that require the candidate to repeat the same story. If two stages probe the same competency, surface as redundant.
- Cross-timezone stages without an accommodation note. Surface a TODO for the recruiter.

The choice to do this as a separate step rather than during stage design is deliberate: while assigning each stage in isolation it is easy to add "one more 30-minute conversation"; only by re-reading the full assembled loop does the candidate-side cost become legible.

### 6. Hiring-manager review gate

Stop. Write the loop design to `loop.md` per the format in `references/3-loop-output-format.md`. Write each stage's scorecard scaffold to `scorecards/<stage>.md`. Do not push anything to Ashby, Greenhouse, or Lever. Do not mark the role as "loop ready" in the ATS. Surface the path to both files and exit.

The hiring manager's job from here: read the loop, validate the competency-to-stage mapping reflects the actual role priorities, edit the questions, and configure the loop in the ATS. The skill does not re-enter the loop until the hiring manager confirms changes for a v2 design.

## Output format

```markdown
# Interview loop — {Role title} ({level})

Generated: {ISO timestamp} · Competencies: {n} · Stages: {n} · Total active time: {hours}

## Competency → stage mapping

| Competency | Stage | Rationale |
|---|---|---|
| Systems design | On-site Interview B | Highest differentiator at IC5; needs 60min for depth |
| Stakeholder influence | HM screen | Top hire/no-hire signal for this role |
| ... | ... | ... |

## Stage 1: Recruiter screen (30 min)

- Goal: confirm fit, interest, must-have-skill basics, comp alignment
- Key questions: ...
- Disqualifying signals: ...
- Not on rubric (screen, not scored)

## Stage 2: Hiring-manager screen (45 min)

- Goal: depth on {top 1-2 competencies}
- Rubric dimensions: {dim 1}, {dim 2}
- Behavioral questions:
1. {Question} — Probe: {follow-up}
2. ...
- Scorecard scaffold: `scorecards/02-hm-screen.md`

## Stage 3: On-site Interview A — {Competency} (60 min)

- Rubric dimension: {dim}
- Anchor descriptions ({level band}):
- 5 — {anchor}
- 4 — {anchor}
- 3 — {anchor}
- 2 — {anchor}
- 1 — {anchor}
- Behavioral questions:
1. {Question} — Probe: {follow-up}
2. ...
- Scorecard scaffold: `scorecards/03-onsite-a.md`

## Suggested interviewer assignments

| Stage | Primary | Backup | Rationale |
|---|---|---|---|
| HM screen | {HM name} | — | hiring manager |
| Onsite A — Systems design | Jamie L. | Priya R. | Jamie calibrated IC5 systems-design; Priya outside hiring team |
| ... | ... | ... | ... |

## Stage N: Debrief

- Format: independent scoring submitted before discussion
- Decision criteria: {explicit thresholds — e.g. "no rubric dimension < 3, aggregate >= 16"}

## Candidate-experience pass

- Total active interview time: {hours}
- Distinct interviewers: {n}
- Cross-timezone stages: {none | list with accommodation TODO}
- Redundant signal flagged: {none | list}
- Take-home recommendation: {none | move {competency} to take-home}

## Open TODOs for hiring manager

- ...
```

## Watch-outs

- **Interviewer overload from the same person being assigned everywhere.** *Guard:* step 4 enforces "no interviewer in more than one stage of the same loop" as a hard rule. The assignment table surfaces backup interviewers per stage so the recruiter has a fallback when the primary is unavailable, rather than re-using the primary in two stages.
- **Redundant signal across stages.** *Guard:* the candidate-experience pass in step 5 re-reads the loop and flags any competency probed in more than one stage. The competency-to-stage table in the output makes redundancy visible to the hiring manager in review.
- **Candidate experience neglected.** *Guard:* the candidate-experience pass in step 5 is a separate, named step rather than a sentence at the bottom of the loop. It enforces total-time caps, distinct-interviewer caps, take-home suggestions for competencies that bloat the loop, and timezone accommodation TODOs.
- **Calibration drift inside a single loop.** *Guard:* the rubric block emitted in step 3 includes anchor descriptions per score level pulled from the competency library, not free-text "rate 1 to 5". Anchors are the thing that holds calibration when the same candidate is scored by four different interviewers.
- **Hiring manager rubber-stamps the design.** *Guard:* skill stops at the review gate in step 6 and writes to files. There is no "publish to ATS" action defined anywhere in this skill. The HM has to open the file and edit it before configuring the loop.
- **Generic loops where role specificity matters.** *Guard:* step 1 refuses to run if `must_have_competencies` is empty or if the interviewer pool is missing calibrated coverage. The skill never falls back to a "default loop" for the function.

# Competency library — TEMPLATE

> Replace this template's contents with your team's actual competency
> library. The interview-loop-builder skill reads this file on every
> run; without your real library, the loop output is generic and the
> rubric anchors are uncalibrated.

## How to use

Each competency has:

- A short ID (used by `must_have_competencies` in the skill input)
- A one-sentence definition
- The level bands it has anchor descriptions for
- Anchor descriptions per score level (1-5) per level band

The skill maps each competency to a stage and emits the anchors for the candidate's level band into the per-stage rubric.

## Coverage matrix

| Competency ID | Definition | Bands covered |
|---|---|---|
| systems-design | Designs systems that meet current requirements while preserving headroom for known future needs. | IC3, IC4, IC5, IC6 |
| stakeholder-influence | Builds shared understanding and commitment across stakeholders without formal authority. | IC4, IC5, IC6, M1, M2, M3 |
| technical-depth | Reasons from first principles in the candidate's primary technical domain. | IC1, IC2, IC3, IC4, IC5, IC6 |
| ownership | Drives ambiguous problems to resolution, including the parts not in the original scope. | IC2, IC3, IC4, IC5, IC6, M1, M2 |
| communication-written | Writes documents that move decisions forward without a meeting. | IC3, IC4, IC5, IC6, M1, M2, M3 |
| people-leadership | Hires, develops, and retains a high-performing team. | M1, M2, M3, Director, VP |
| strategic-thinking | Identifies the right problem to solve before optimizing the solution. | IC5, IC6, M2, M3, Director, VP |

## Per-competency anchors

### systems-design (IC4 band)

- **5** — Designs the system end-to-end including failure modes, upgrade paths, and observability. Names trade-offs explicitly with reasoning per axis (latency, cost, complexity, blast radius).
- **4** — Designs the happy path and the most likely failure modes. Names trade-offs but reasoning is uneven across axes.
- **3** — Designs the happy path. Identifies failure modes when prompted. Trade-offs implicit, surfaced under follow-up questioning.
- **2** — Designs a workable system but misses obvious failure modes or scaling cliffs. Trade-offs not articulated.
- **1** — Designs a system that does not meet stated requirements, or cannot articulate why a design is structured as proposed.

> Replace the IC4 anchors above with your real anchors. Add anchor
> blocks for IC3, IC5, and IC6 in the same shape.

### stakeholder-influence (M2 band)

- **5** — Names the specific stakeholders, their incentives, the surface area of the disagreement, and the sequence of conversations that produced commitment. Outcome was a documented decision change.
- **4** — Names the stakeholders and incentives, walks through conversations, but the outcome description is fuzzy.
- **3** — Walks through one or two conversations. Stakeholder incentives are inferred under prompting.
- **2** — Describes the disagreement and the resolution but cannot describe the work between them.
- **1** — Describes a meeting outcome with no underlying stakeholder work.

> Replace the M2 anchors above with your real anchors. Add anchor
> blocks for IC4, IC5, IC6, M1, M3 in the same shape.

## Calibration discipline

When you add or change anchors:

- Run a calibration session with at least 3 interviewers scoring the same recorded interview. If their scores diverge by more than 1 point, the anchors are not calibrated yet.
- Date each change. The skill does not version anchors automatically; if you change an anchor mid-loop, the consistency of the loop is on you.
- Retire anchors that the calibration set cannot reproduce. Vague anchors are worse than no anchors — they create the illusion of structure.

## Last edited

{YYYY-MM-DD} — update on every material change.

# Interviewer strengths matrix — TEMPLATE

> Replace this template's contents with your team's actual interviewer
> pool and calibrated strengths. The interview-loop-builder skill
> reads this file on every run; without it, the assignment table is
> guessed rather than matched.

## How to use

Each row is one eligible interviewer. Columns are the competency IDs from `1-competency-library.md`. Each cell is the level band(s) the interviewer is calibrated to score that competency on. Empty cell = not calibrated; the skill will not assign.

The skill reads this matrix in step 4 (interviewer assignment) and matches by:

1. Calibration fit (interviewer is calibrated on the competency at the candidate's level band).
2. Load — at most one stage per loop per interviewer.
3. Diversity — at least one interviewer from outside the hiring team when the eligible pool allows.

## Pool

| Interviewer | Team | systems-design | stakeholder-influence | technical-depth | ownership | communication-written | people-leadership | strategic-thinking | Last interview (date) |
|---|---|---|---|---|---|---|---|---|---|
| Jamie L. | Platform | IC4, IC5 | — | IC3, IC4 | IC4 | — | — | — | 2026-04-19 |
| Priya R. | Data | — | IC4, IC5, M1 | IC4 | IC4, IC5 | IC4, IC5 | — | — | 2026-04-22 |
| Marcus T. | Product | — | IC5, M1, M2 | — | IC5 | M1, M2 | M1, M2 | M2 | 2026-04-12 |
| Aiko S. | Eng leadership | IC5, IC6 | M1, M2, M3 | IC5 | IC5, M1 | M2, M3 | M1, M2, M3 | M2, M3, Director | 2026-04-26 |

> Replace the rows above with your real interviewer pool. The columns
> are the competency IDs you defined in `1-competency-library.md`.

## Calibration discipline

When you add an interviewer or extend an interviewer's coverage:

- They sit shadow on at least 2 interviews at the new level band, then reverse-shadow (they score, the calibrated interviewer audits) on at least 2 more.
- Both calibrated interviewer and new interviewer sign off in the matrix before the cell is added.
- Retire a calibration band if the interviewer has not used it in 6 months. The "Last interview" column is the trigger to re-calibrate.

## Outside-the-hiring-team rule

The skill prefers at least one interviewer outside the hiring team per loop, to reduce consensus bias in debriefs. To make this work:

- Tag each interviewer's `Team` accurately.
- Ensure the eligible pool for each role family includes at least 2 outside-team interviewers per must-have competency at the role's level. If it does not, the skill will assign in-team only and surface a TODO ("expand cross-team calibration for {competency}").

## Last edited

{YYYY-MM-DD} — update on every calibration change.

# Loop output format — TEMPLATE

> The interview-loop-builder skill emits its output in the format
> below. This file documents the format so the team can adjust before
> running the skill at scale; the skill reads this file as the literal
> template, not a guideline.

## File layout

The skill writes:

- `loop.md` — the loop design, in the format below.
- `scorecards/<NN>-<stage-id>.md` — one scorecard scaffold per post-screen stage, with the rubric block prefilled and an empty scoring section.

## loop.md format

```markdown
# Interview loop — {Role title} ({level})

Generated: {ISO timestamp}
Competencies: {n}
Stages: {n}
Total active interview time: {hours}h
Distinct interviewers: {n}

## Inputs summary

- JD: {path}
- Level: {level}
- Must-have competencies: {comma-separated IDs}
- Interviewer pool: {path to filled interviewer-strengths matrix}
- Loop length cap: {n}
- Time zone hint: {tz | not provided}

## Competency → stage mapping

| Competency | Stage | Rationale |
|---|---|---|

## Stage 1: Recruiter screen (30 min)

Goal, key questions, disqualifying signals. Not on rubric.

## Stage 2: Hiring-manager screen (45 min)

Goal, rubric dimensions, behavioral questions with probes, scorecard
scaffold link.

## Stage 3..N: On-site interviews (60 min each)

For each stage:

- Rubric dimension (single competency)
- Anchor descriptions for the candidate's level band (5 lines, one
  per score level, pulled from competency library)
- Behavioral questions with probes (3-5)
- Scorecard scaffold link

## Suggested interviewer assignments

| Stage | Primary | Backup | Rationale |
|---|---|---|---|

## Stage N+1: Debrief

Format (independent scoring before discussion), decision criteria
(explicit thresholds), debrief facilitator.

## Candidate-experience pass

- Total active interview time
- Distinct interviewers
- Cross-timezone stages with accommodation TODOs
- Redundant signal flagged
- Take-home recommendation if loop is over the time cap

## Open TODOs for hiring manager

- ...
```

## scorecards/<NN>-<stage-id>.md format

```markdown
# Scorecard — {Stage} ({Competency})

**Candidate:** {name}
**Interviewer:** {name}
**Date:** {YYYY-MM-DD}
**Level:** {level}

## Rubric dimension: {Competency}

Anchor descriptions ({level band}):
- 5 — {anchor}
- 4 — {anchor}
- 3 — {anchor}
- 2 — {anchor}
- 1 — {anchor}

## Behavioral questions

1. {Question}
   - Probe: {follow-up}
   - Notes:
2. ...

## Scoring

- Score: ___ / 5
- Evidence (cite candidate's words for the score):
- Hire / no-hire on this dimension: ___
- One-line rationale:

## Submit

Independent scoring submitted to debrief before discussion.
```

## What the format enforces

- Every rubric dimension has anchor descriptions inline. No "rate 1 to 5" without anchors.
- Every behavioral question has a probe. Interviewers do not have to invent follow-ups in the moment.
- Every interviewer assignment has a rationale. The hiring manager can audit why a person was assigned without re-running the skill.
- The candidate-experience pass is a section, not a sentence.
- Open TODOs are explicit. The skill never silently leaves a gap.

## Last edited

{YYYY-MM-DD}