A Claude Skill that audits which Salesforce opportunities genuinely meet the exit criteria of the stage they just moved into. For every opp that progressed in the prior week, the Skill checks the deterministic rules (required fields, logged activities, stakeholder roles), then cross-references the rep’s qualitative claims against Gong call transcripts. The output is a coaching queue for the RevOps weekly review, not an enforcement gate that rolls deals back automatically.
The artifact bundle ships at apps/web/public/artifacts/stage-progression-validator-skill/ and contains SKILL.md plus three reference templates: references/1-stage-criteria-template.md (the team’s stage rubric), references/2-methodology-mapping-template.md (how MEDDPICC, MEDDIC, SPICED, BANT, or a custom framework maps onto your Salesforce fields and Gong phrase patterns), and references/3-sample-output-format.md (the exact Markdown the Skill emits).
When to use
Run this on the cadence of your forecast meeting. The canonical pattern is a Sunday-night batch keyed to week_ending, with the report dropping into a Slack channel ahead of the Monday-morning manager huddle. Single-opp mode is also valid — a deal-desk reviewer can run the Skill against one Opportunity.Id before a pricing-approval meeting, or a manager can run it against a single deal before a 1:1 to ground the conversation in the specific gaps rather than in a vague “this feels stuck” feeling.
The qualitative-claim check is the part that pays for itself. Salesforce already enforces required-field validation rules; what it cannot do is notice that the rep claimed “buyer agreed on success criteria” and then no Gong call in the last 45 days actually captured that conversation. The Skill is methodology-aware about how it searches — for MEDDPICC’s economic buyer, it looks for the buyer’s name within twelve tokens of decision-language (“approve”, “sign off”, “budget owner”) rather than just any mention of the name. That distinction is what separates a useful flag from a false positive that reps learn to ignore.
When NOT to use
- Auto-rollback. Do not wire the Skill’s output into a Salesforce update that demotes deals on a
failverdict. The verdict is one input among several; the manager owns the demotion decision with full context the Skill cannot see (off-Gong meetings, side-channel commitments, customer-side procurement quirks). - Performance management. A single
failon a single deal is noise. The signal is patterns over weeks — the rep whosefailrate climbs from 5% to 30% over a quarter while peers hold steady. Using a one-shot verdict in a PIP collapses rep trust and the Skill stops working. - Comp inputs. Stage drives forecast, sometimes drives accelerators. If validator output flows into comp calculations, you have created a direct incentive for reps to game the inputs — refuse Gong recording, omit notes, store data in side-of-desk spreadsheets. Keep the validator output in the coaching channel and out of the comp pipeline.
- Stages without a documented rubric. If
references/1-stage-criteria-template.mdhas no entry for the stage being validated, the Skill emitsneeds_methodologyrather than guessing. Do not “tune” the Skill to score those stages with a default — fix the rubric instead. - Teams that store nothing structured. A team running MEDDPICC in slides and not in Salesforce will fail every qualitative check. Run the Skill in dry-run mode for two weeks; if more than 40% of opps land in
needs_methodologyor score below 0.2 on qualitative checks across the board, the methodology mapping doc is fictional. Fix the doc or instrument the missing fields before going live.
Setup
- Document the stages. Open
references/1-stage-criteria-template.mdand replace the template contents with your team’s real rubric, stage by stage. Each stage has three rule buckets:field_rules(a Salesforce field must hold a non-default value),activity_rules(a logged activity of a specified type must exist within a recency window), andstakeholder_rules(OpportunityContactRolemust include a contact with a role matching a regex). Mark fields asevidence_required: gongwhen you want a Gong-transcript cross-check on the qualitative claim. - Map the methodology. Edit
references/2-methodology-mapping-template.mdto match your team’s framework. The file ships with worked examples for MEDDPICC, MEDDIC, and SPICED — copy whichever matches, then adjust the Salesforce field names to your org’s actual API names. The phrase patterns column is what tells the Skill what counts as Gong evidence; do not leave it as the template default unless your fields genuinely match the example mappings. - Install the Skill. Drop the bundle into
~/.claude/skills/stage-progression-validator/. SetSFDC_TOKEN(read-only onOpportunity,OpportunityFieldHistory,Task,Event,OpportunityContactRole) andGONG_API_KEY(withcalls/extensiveanddealsscopes). Read-only is the right scope; the Skill must not write back to Salesforce. - Schedule the weekly run. A simple cron is fine —
claude run stage-progression-validator week_ending=$(date -d 'sunday' +%F)Sunday at 22:00. Pipe the output to your Slack channel or a weekly-digest email. - Pair it with a coaching ritual. The verdict queue is useless if nobody opens it. Standing 30-minute Monday slot, manager walks the
failandneeds_manager_reviewrows with each rep. After eight weeks, the volume in those buckets should drop — that is the success metric.
What the skill actually does
For each progression in the window, the Skill computes two scores. The deterministic score is the fraction of methodology rules satisfied — five rules, three pass, the score is 0.6. This is structured-rubric rather than free-form natural-language by design: free-form criteria force the model to interpret edge cases inconsistently across runs and reps cannot predict what will trip a fail, which kills the trust the tool depends on.
The qualitative score is the fraction of evidence_required: gong claims that find supporting transcript evidence inside the relevant window. The phrase matching is methodology-aware. For MEDDPICC’s economic buyer, the Skill looks for the buyer’s name within twelve tokens of decision-language. For SPICED’s critical event, it looks for date-bounded urgency language with consequence verbs (“miss”, “slip”, “risk”) nearby. A naive “any mention of the buyer’s name counts” check produces too many false passes — the rep mentioning the buyer in passing on a call to a different stakeholder is not evidence of buyer commitment.
The two scores combine into one of five verdicts: pass (both at 1.0), flag (one bucket strong, the other weak), fail (both below the borderline threshold, default 0.6), needs_manager_review (the borderline band between flag and fail — neither score clearly bad nor clearly good), or needs_methodology (the rubric has no entry for this stage). The needs_manager_review bucket exists because forcing every borderline deal into a binary flag versus fail produces noise that reps learn to dismiss; the borderline rows go to a separate queue the manager hand-resolves, which preserves the signal in the other buckets.
Cost reality
Claude Sonnet 4 at current pricing runs roughly 15-25 cents per validated opportunity, dominated by reading Gong transcripts (typical 30-day window covers 4-8 calls per active deal at 5-15K tokens each, plus a few hundred tokens of methodology rubric loaded from references). A 50-deal weekly batch costs around 7-12 USD in API spend.
The time saved is the case for the Skill. A RevOps lead doing this audit by hand spends 20-30 minutes per deal — pulling the stage history, opening each Gong call, scanning for the buyer’s name and the success-criteria conversation. At 50 deals that is two full days of hand-audit per week, which is why almost no team actually does it. The Skill collapses that to a 4-6 minute report-review pass on the digest, with deeper inspection only on the rows in the fail and needs_manager_review buckets — typically 5-10 deals out of 50, so 30-60 minutes of focused review. Net: 12-15 RevOps hours per week back, for under 15 USD in API cost.
Success metric
Track two metrics over an eight-week ramp. First, the fail rate — the share of weekly progressions that land in fail. A healthy ramp shows it dropping from a baseline (often 25-40% in the first run) to under 10% as reps internalize what the rubric requires before they advance a deal. If it does not drop, either the rubric is too strict (reps physically cannot satisfy it without buyer conversations the deal is not ready for) or the coaching loop is not happening. Second, the median stage age in the stage immediately before the strictest gate. If that ages out — meaning reps are parking deals one stage below their reality to dodge the gate — the rubric is wrong, not the reps. Tune the rubric down before keeping the Skill running.
vs alternatives
- Salesforce validation rules — these enforce field presence at the record level (you cannot save an opp in Stage 4 without
Economic_Buyer__cpopulated). They cannot do the qualitative check: a rep can type any name into the field, validation rules pass, the Skill catches that no Gong call supports the claim. Validation rules are also a blunt instrument because they reject the save outright; the Skill produces a graded verdict the manager works with. - Clari, Gong Forecast, and similar AI-forecasting tools — these do stage-validation as part of a much bigger product surface (forecast, deal review, conversation analytics, coaching). Expect 50-150 USD per rep per month versus this Skill’s roughly 10-15 USD per week of API cost. Pick the platform if you also need its forecasting and conversation-analytics layers; pick this Skill if your gap is specifically the stage-progression audit and you already have Salesforce and Gong.
- Manual deal-desk reviews — a human RevOps lead reading every progression. The right tool for high-ACV enterprise teams where deals are few and consequential. Wrong tool for SMB or volume midmarket where the audit cost (12-15 hours per week) means it does not happen at all and bad progressions ship to forecast.
- Doing nothing — the actual baseline at most teams. Forecast accuracy at most B2B SaaS orgs sits somewhere between mediocre and embarrassing precisely because the stages on which the forecast is built are not validated. The cost of doing nothing shows up in CFO reaction to a bad-quarter print, which is a worse moment to discover the input data was untrustworthy.
Watch-outs
- Over-strict validation pushes reps to game stages. Guard: instrument median stage age in the stage immediately before the strictest gate. If it balloons after the Skill ships, the rubric is wrong; tune it down before continuing.
- Methodology mismatch between slides and Salesforce. Guard: dry-run for two weeks. If
needs_methodologyplus low-qualitative scores cover more than 40% of opps, fix the methodology mapping or the underlying field instrumentation before treating any verdict as actionable. - Validator drift from real exit criteria. Sales leaders quietly redefine stage meanings in QBRs; the rubric file does not get updated. Guard: the rubric carries a
last_reviewedfield; the Skill prepends a warning to every report when the date is older than 90 days. - Gong recording-coverage gaps look like rep dishonesty. Guard: the methodology-mapping file declares a per-stage
recording_coverage_floor. Deals below the floor land inneeds_manager_reviewwith the coverage gap surfaced explicitly, not infail. - Rep pushback on a
failverdict. Guard: the report includes the deterministic-rule misses verbatim and the unmatched phrase patterns. The conversation grounds in the specific gap, which the rep can fix by updating the field and re-running, or push back on with off-Gong evidence the manager accepts.
Stack
- Salesforce — stage history, deal fields, contact roles, logged activities
- Gong — recorded conversation transcripts, deal-level call lists
- Claude (Sonnet 4) — methodology-aware phrase matching against transcripts, verdict synthesis
- Cron / scheduler of choice — the weekly trigger
- Slack or email — the digest channel where the report lands ahead of the manager huddle