A customer health score is a single composite number, usually 0-100 or a red/yellow/green band, that rolls up five families of signal — product usage, engagement, sentiment, support, and outcomes — into one indicator of how likely an account is to renew, expand, or churn. It exists so a CSM owning 40-80 accounts can triage attention without reading every account by hand, and so CS leadership can forecast net revenue retention from something other than gut feel.
What it is not: it is not a churn prediction model, and it is not NPS. A churn model outputs a probability from a trained classifier; a health score is a transparent, hand-weighted rollup a CSM can explain to a customer. NPS is one sentiment input into health, not a substitute for it. Treating the score as ground truth rather than a prioritization aid is the most common way teams misuse it.
The five signal families
- Usage — logins, feature adoption breadth, seats activated vs. provisioned, depth on the features that map to your value proposition. The strongest leading signal for most SaaS.
- Engagement — QBR attendance, email open/reply, exec sponsor responsiveness, community or training participation.
- Sentiment — NPS, CSAT, CES, plus qualitative CSM-logged sentiment. The softest and most gameable input.
- Support — ticket volume, severity, time-to-resolution, escalations, bug counts against this account.
- Outcomes — has the customer hit the success plan milestones, realized the ROI they bought for, time-to-value (TTV) achieved? The hardest to instrument and the most predictive of renewal.
The formula
A health score is a weighted sum of normalized component scores:
Health = Σ (component_score_i × weight_i) where Σ weight_i = 1.0
Each component is normalized to 0-100 first (so a raw login count and a raw NPS land on the same scale), then weighted. A defensible starting weight set for a seat-based B2B SaaS product:
| Signal family | Weight |
|---|---|
| Usage | 0.35 |
| Outcomes | 0.25 |
| Engagement | 0.15 |
| Sentiment | 0.15 |
| Support | 0.10 |
Banding: 70-100 green, 40-69 yellow, under 40 red. Calibrate the cutoffs against your own renewal data — run the score retrospectively against the last 12 months of renewals and churns, and move the green/yellow line to where it actually separates renewers from churners.
Leading vs. lagging
This is the distinction that makes a score useful. A leading signal moves before the renewal outcome and is intervenable — declining weekly active usage, a champion who left, slipping QBR attendance. A lagging signal confirms what already happened — a submitted CSAT after a bad quarter, a non-renewal notice. Weight leading signals higher: usage and outcome-progress are leading; a closed support ticket and a survey response are lagging. A score dominated by lagging inputs tells you an account is unhealthy the week it churns, which is too late to act.
Worked example
An account: usage normalized to 80, outcomes to 50, engagement to 90, sentiment to 70, support to 60.
Health = 80×0.35 + 50×0.25 + 90×0.15 + 70×0.15 + 60×0.10
= 28 + 12.5 + 13.5 + 10.5 + 6
= 70.5 → green (barely)
The score is green, but outcomes at 50 is the load-bearing weakness — strong product usage and a happy sponsor are masking the fact that the customer has not realized the ROI they bought. This is exactly the account a usage-only score would mislabel as safe. The CSM action is a success-plan reset, not a check-in.
Common pitfalls
- Usage-only scores. Easy to instrument, so teams ship them and stop. A heavy power user mid-onboarding can show high usage while the renewal is already lost on a missing outcome. Guard: force a non-zero outcomes weight even if you can only proxy it (success-plan milestone completion).
- Set-and-forget weights. Weights drift from reality as the product and segment mix change. Guard: re-run the score against actual renewal/churn outcomes quarterly and re-fit the weights; if the green band isn’t separating renewers from churners, it’s miscalibrated.
- Score laundering. When CSM-entered sentiment is a heavy input, reps inflate it to keep their book green. Guard: cap subjective inputs at the 0.15-0.20 range and audit sentiment against objective signals.
- One score for all segments. A 12-seat SMB account and a 4,000-seat enterprise account don’t share a usage curve. Guard: maintain per-segment weight sets and bands, not one global formula.
- No action mapping. A score nobody acts on is a dashboard ornament. Guard: every band transition (green→yellow, yellow→red) fires a named play with an owner, not just a color change.
Related
- NRR vs GRR — the retention metrics a health score is built to predict
- Gainsight and Planhat — platforms with configurable health scorecards
- ChurnZero and Vitally — usage-driven health and playbook automation