claude-skill

Salesforce data cleanup with a Claude Skill

Difficulty

advanced

Setup time

90min

For

revops

RevOps

Stack

A Claude Skill that scans Salesforce for the data garbage quietly distorting your reporting — duplicate accounts, orphan contacts, junk leads, malformed phones, account-contact mismatches, and stage values that violate the funnel definition — then proposes fixes as a CSV the operator approves before any write lands. The Skill never writes without an explicit dry-run plus human approval, and every applied change is logged to a custom audit object so it can be reverted.

The full bundle lives at apps/web/public/artifacts/salesforce-data-cleanup-skill/. The SKILL.md carries the inputs, method, and output format the Skill follows. Three reference files act as the operator’s fillable scaffolding: dedup-rules.md for match keys and similarity thresholds, stage-definitions.md for the per-stage required-field set, and survivor-ranking.md for the weights used to pick which record wins a merge.

When to use

Reach for this Skill when reporting has stopped being trustworthy because the underlying object data has decayed faster than the team can clean it. Specific triggers: a board ARR number disagrees with the CRO’s pipeline view by more than a couple of percent; the SDR team complains about hitting the same prospect under three different Account records; a marketing-attribution dashboard is double-counting because contacts exist on the wrong accounts; an annual ICP re-segmentation is blocked because firmographic fields are missing on a quarter of accounts. In all of these the bottleneck is data hygiene, not strategy.

The Skill is also the right call when an incumbent dedup tool has produced a shelfware effect — RevOps has a license but nobody trusts the proposals enough to act on them. The Skill’s discriminator is that every proposed merge ships with a per-pair rationale line citing the deterministic key that fired and the survivor-selection signals that drove the choice. That auditability is what unblocks the human approval the cleanup needs.

When NOT to use

Do not use this Skill if any of the following holds.

You need a real-time dedup-on-create gate at the point of lead capture. The Skill is a batch tool that scans in chunks, not a synchronous validation rule. For point-of-create dedup, configure Salesforce’s native Duplicate Rules.

You need the Skill to auto-apply writes. There is no auto mode by design. Every fix passes through a dry-run CSV the operator must mark Approve=Y on before apply_fix will touch a row. If the operating model requires unsupervised writes, the Skill is the wrong shape and the right answer is a deterministic ETL job with explicit owner sign-off in the change-management process.

You are responding to a GDPR or CCPA right-to-erasure request. Use the platform’s documented PII purge flow, which routes through legal and produces the right paper trail. Do not improvise around it with a cleanup tool.

You want hard-deletes that bypass the recycle bin. The Skill has no hard-delete code path. Recycle-bin discipline is non-negotiable; permanent purges are a deliberate manual platform action.

The first run is against production with a write-scoped token. Two read-only scan cycles are required before the Skill will accept write credentials, and even then the first write run should be a sandbox rehearsal of an account merge.

Setup

Drop the bundle at apps/web/public/artifacts/salesforce-data-cleanup-skill/ into ~/.claude/skills/salesforce-data-cleanup/. The Skill loader picks up SKILL.md and the references/ directory automatically.
Set SFDC_TOKEN to a read-only Connected App token. Set SFDC_INSTANCE_URL to the sandbox endpoint, not production. The Skill defaults sandbox=true and refuses to flip without an explicit override flag.
Replace the contents of references/dedup-rules.md, references/stage-definitions.md, and references/survivor-ranking.md with the team’s real rules. The templates are scaffolding; running against them on a live org will produce a high false-positive rate by design.
Provision the Cleanup_Audit__c custom SObject in the sandbox and production orgs using the field shape documented in SKILL.md under “Method, step 5”. The audit log is what makes runs reversible — without it, do not run apply_fix.
Run the first discovery scan. scan_data_health(scope="Account,Contact,Lead,Opportunity"). Expect the scan to surface flaws in the dedup ruleset on the first pass — that is the point of the read-only cycles.

What the Skill actually does

The Skill runs five steps in order, documented in full in SKILL.md. Discovery scan pulls each in-scope SObject via the Bulk API in chunks, because a single REST query against a 100k-Account org will hit governor limits and chunked Bulk avoids the timeout ceiling on large pulls.

Dedup uses a two-pass hybrid. Pass one is deterministic — lowercased domain, E.164-normalized phone, NFKD-normalized name with corporate suffixes stripped. Exact matches on a single strong key go to the proposal CSV at high confidence. Pass two is a Claude semantic-similarity comparison, but only on candidate pairs that already share a weak deterministic signal (same first six phone digits, same first name token, same parent-domain TLD). The narrow-then-rank approach is what holds per-scan token cost to under five dollars on a 100k-Account org; pure-pairwise semantic over N^2 records is both expensive and noisy on common names.

Survivor selection for merges uses a composite score: 0.4 weight on activity recency in the last 90 days of Tasks and Events, 0.3 weight on attached contact count, 0.2 weight on Opportunity history (count plus log of Amount), 0.1 weight on whether LastModifiedById is the integration user. No single signal is reliable on its own — most-recent-modification often points at a backfill, contact count favors crusty old records, and Opportunity Amount alone discards the active relationship. The composite tracks where the team is actually working today.

Dry-run emits a CSV with Operation, Field, Old_Value, New_Value, Confidence, Survivor_Id, Rationale, and an Approve column the operator must set. Apply reads the approved CSV and writes via Bulk API, logging every change to Cleanup_Audit__c with both the prior and new JSON values so a revert(run_id) companion can re-apply the originals.

Cost reality

A discovery scan against a 100k-Account, 500k-Contact org runs in roughly twenty minutes wall-clock and consumes about 3-5 dollars of Claude API tokens for the semantic-similarity pass. Bulk API call quota usage is in the low hundreds of calls per scan; well under any standard org’s daily ceiling. The applied write run is itself the smaller cost — Bulk API writes do not consume Claude tokens, they only burn a few additional API calls per chunk of approved rows.

The headcount math is the real story. A typical RevOps cleanup sprint at the sizes above runs two or three weeks of one analyst’s time per quarter, plus a couple of days from a Salesforce admin. The Skill collapses that to roughly two days per quarter — a discovery scan, a half-day reviewing the dry-run CSVs, a sandbox rehearsal of any account merges, and an apply run. On a fully-loaded RevOps salary, that is a meaningful saving across a year.

The cost the Skill does not eliminate is the rep-communication overhead. A merge run with no comms still burns trust faster than the bad data did, and the change_brief.md the Skill emits alongside every applied run is a template the operator still has to send.

Success metric

Watch one number per scan: the share of high confidence proposals the operator approves on first review. On the first run that number is typically under fifty percent — that is the dedup ruleset getting tuned, not the Skill underperforming. Within three or four scan cycles, with the rules adjusted, that number should land above eighty percent. Below that floor at cycle four, the dedup rules in references/dedup-rules.md are still mismatched to the data and need another pass before any further write runs.

A secondary metric: stage-violation count over time. A healthy org with a real funnel definition should see this number trend down month over month as RevOps fixes the upstream causes — required-field validation rules, stage-transition automations, lead-routing logic. If stage-violation count is flat across cleanup cycles, the dirty-data problem is process not data.

vs alternatives

DemandTools is the incumbent in this space. It is a mature, deterministic, GUI-driven tool that RevOps teams have used for a decade. It is excellent at high-volume deterministic dedup; it is weaker on the survivor-rationale audit trail this Skill emits, and it cannot do the semantic-similarity pass for fuzzy company names without a separate scripting layer. If the team is already paying for DemandTools and the dedup ruleset is mature, stay there and consider this Skill only for the semantic dedup edge cases and the audit-log discipline.

Cloudingo is the closer point comparison — it has fuzzy matching and a review-then-apply workflow that resembles what the Skill produces. Cloudingo is more user-friendly for a non-technical RevOps lead. The Skill’s edge is the per-pair rationale line and the reference-file model that lets the team version-control its dedup rules in git alongside the rest of the RevOps config. If RevOps is allergic to git, Cloudingo wins.

A manual RevOps-led cleanup sprint is the status-quo alternative for teams without a dedup tool. It works, but it consumes the analyst time documented above and produces no reusable artifact — the next sprint starts from scratch. The Skill’s discovery scan is the same artifact every time, which makes the work compoundable.

Watch-outs

Granting write access on the first run is the most common failure. The first scan surfaces flaws in the dedup ruleset as much as in the data; if the Skill applies them, the false positives turn into real, audited writes. The guard: the Skill refuses apply_fix when the configured token has write scope and the audit log shows zero prior dry-run rows for the run’s scope. Two read-only cycles minimum, regardless of how confident the rules look.

Account merges cascading to the wrong records is the most expensive failure. A wrong survivor takes the wrong Opportunities, Tasks, Events, and Contact Roles with it. The guard: apply_fix for any dedup_account row refuses to run unless a sandbox rehearsal with the same Run_Id prefix has happened in the last fourteen days, and the operator has set --rehearsed=true. The sandbox rehearsal is not optional ceremony — it is where the cascading side-effects of any given merge actually get observed.

Reps waking up to merged accounts they never heard about is the cultural failure that kills future cleanup runs. The guard: the Skill emits a change_brief.md alongside every applied run, listing the merge map, owner emails, and count of moved Opportunities, ready to paste into a Slack channel before reps log in. Send it. Skipping the comms step burns trust faster than the bad data ever did.

Hard-deletes bypassing the recycle bin is a request that comes up but should be refused. The guard: the Skill has no hard-delete code path. soft_delete is the only delete operation; anyone wanting a permanent purge does it through the platform’s manual workflow with the appropriate sign-off.

Stack

Salesforce — source of truth and target of writes; Cleanup_Audit__c custom object holds the reversible audit log
Claude — runs the semantic-similarity pass and emits the per-pair rationale lines that make merges auditable
Bulk API — used for both reads (chunked discovery) and writes (chunked apply); never the synchronous REST query API for full scans

Edit this page on GitHub

Files in this artifact

Download all (.zip)

---
name: salesforce-data-cleanup
description: Find and propose fixes for Salesforce data-quality issues — duplicate accounts, orphan contacts, junk leads, malformed phones, mismatched account-contact relationships, and stage values that violate funnel definitions. Always proposes; never writes without explicit dry-run review and approval.
---

# Salesforce data cleanup

## When to invoke

Whenever you need to surface and propose fixes for Salesforce data-quality issues that are quietly distorting reporting: duplicate accounts and contacts, orphan records (contacts with no account, opportunities with no contact roles), junk leads (test records, role-address contacts, free-mail in B2B), malformed phones and websites, account-contact mismatches (contact email domain does not match account website domain), and stage values that violate the funnel definition (e.g. "Closed Won" with no Close Date, "Negotiation" with zero opportunity contact roles).

The Skill takes an object scope, a dedup-rules file, a stage-definition file, and an ICP rubric. It produces a discovery scan, then per-issue dry-runs as CSVs the operator approves before any write.

Do NOT invoke this skill for:

- Auto-applying writes without a dry-run + human approval. Every fix goes through the dry-run CSV gate. There is no "auto" mode.
- First runs against production with write access. The first two scan cycles must use a read-only API token. Write credentials only after the operator has reviewed the false-positive rate.
- Hard-deletes. The Skill never bypasses the recycle bin. Deletes are soft by default; permanent purges are out of scope and a deliberate manual decision.
- Bulk merges in production without a sandbox rehearsal. Account merges cascade to opportunities, activities, and contact roles — sandbox first.
- Data residency / PII purge requests under GDPR or CCPA. Use the platform's documented right-to-erasure flow, not this Skill.

## Inputs

- Required: `scope` — comma-separated SObject names to scan (e.g. `Account,Contact,Lead,Opportunity`). Default: all four.
- Required: `dedup_rules_path` — path to a `references/dedup-rules.md` style file naming the deterministic match keys (domain, normalized phone, normalized name) and the fuzzy-match thresholds.
- Required: `stage_definitions_path` — path to a `references/stage-definitions.md` style file naming the required field set per Opportunity stage.
- Required: `icp_rubric_path` — path to a `references/icp-rubric.md` style file used to evaluate whether orphan or low-signal records are worth keeping.
- Optional: `survivor_ranking_path` — path to a `references/survivor-ranking.md` style file with weights for activity recency, contact count, opportunity history, and last-modified-by user trust. Default weights apply if absent.
- Optional: `sandbox` — boolean, default `true`. Forces the Skill to talk to the configured sandbox endpoint, not production.
- Optional: `chunk_size` — integer, default `10000`. Bulk API query chunk size.

## Reference files

Always read the following from `references/` before scanning. Without them, the discovery output is generic and the dry-run proposals will not match the operator's actual definitions.

- `references/dedup-rules.md` — deterministic and fuzzy match rules (replace the template with your real rules)
- `references/stage-definitions.md` — required fields per Opportunity stage (replace with your actual funnel definition)
- `references/survivor-ranking.md` — weights for choosing the survivor in a merge proposal (replace with your actual priorities)

## Method

Run these five sub-tasks in order. Do not parallelize: later steps depend on the survivor-selection context produced by earlier steps.

### 1. Discovery scan

Pull each in-scope SObject via the Bulk API in chunks (default 10k rows). Bulk API, not REST query, because a full Account scan on a real org is 100k+ rows and REST chops on governor limits. Chunked because a single 1M-row pull blows past memory and the job times out at the org's configured Bulk API batch ceiling.

For each chunk, classify rows into issue buckets:

- `dedup_account` / `dedup_contact` / `dedup_lead`
- `orphan_contact` (no AccountId), `orphan_opp` (no contact roles)
- `junk_lead` (free-mail in B2B, role addresses, test patterns)
- `format_violation_phone` / `format_violation_website` (regex-fail)
- `mismatch_account_contact` (contact email domain != account website)
- `stage_violation` (required field per stage missing)

Output: a one-page scan summary with per-bucket counts and confidence band.

### 2. Dedup pass — deterministic + semantic hybrid

For dedup buckets the Skill runs a two-pass pipeline. Pass one is pure deterministic regex/normalization: lowercased email domain, E.164-normalized phone, NFKD-normalized + lowercased + suffix-stripped name. Exact matches on either domain or normalized phone go straight to the proposal CSV.

Pass two is Claude semantic similarity, applied only to candidate pairs that share at least one weak deterministic signal (same first 6 digits of phone, same first token of name, same parent-domain TLD). Hybrid because pure regex misses "Acme, Inc." vs "Acme Incorporated — APAC", and pure semantic similarity is too expensive to run pairwise across N^2 records and produces too many false positives on common names. The narrow-then-rank approach holds the per-scan token spend to under $5 for a 100k-Account org.

Confidence bands: `high` (deterministic match on 2+ keys), `medium` (deterministic on 1 key + semantic similarity ≥ 0.85), `low` (semantic only). The dry-run CSV ships only `high` and `medium` by default.

### 3. Survivor selection for merges

For each duplicate pair, propose a survivor using a weighted score:

- Activity recency (last 90 days of Tasks + Events): weight 0.4
- Contact count attached: weight 0.3
- Opportunity history (count + total Amount): weight 0.2
- LastModifiedBy not equal to integration user: weight 0.1

The Skill ranks both records, proposes the higher score as survivor, and emits a per-pair rationale line in the dry-run CSV. RevOps reviews and overrides via a `survivor_override` column before approving.

This composite is used because no single signal is reliable: most-recent modification often points at an integration backfill, contact count favors old crusty records, and Opportunity Amount alone discards the active relationship. The composite tracks "where the team is actually working."

### 4. Dry-run CSV per issue

For the issue the operator picks, emit a CSV with one row per proposed change. Columns: `Id`, `Object`, `Operation` (`merge` / `update` / `soft_delete`), `Field`, `Old_Value`, `New_Value`, `Confidence`, `Survivor_Id` (merges only), `Rationale`, `Approve` (operator-set, default blank).

The Skill writes nothing until a CSV with `Approve=Y` rows is passed back into `apply_fix`. Rows without `Approve=Y` are skipped, including blanks. This is the gate.

### 5. Apply with audit

`apply_fix(issue_id, csv_path)` reads the approved CSV and posts the writes in chunks via the Bulk API. Every write is logged to a custom SObject `Cleanup_Audit__c` with: `Operation`, `Target_Id`, `Field`, `Old_Value_JSON`, `New_Value_JSON`, `Survivor_Id`, `Run_Id`, `Approver_User`, `Timestamp`. The audit record makes the change reversible: a `revert(run_id)` companion script reads the audit log and re-applies `Old_Value_JSON`.

## Output format

The discovery scan returns a Markdown report and writes one dry-run CSV per issue picked. Literal example:

```markdown
# Salesforce data-cleanup scan — 2026-05-03

Scope: Account, Contact, Lead, Opportunity
Sandbox: true
Run ID: run_2026-05-03_a8c1

## Summary by issue type

| Issue | Count | High conf | Medium conf | Low conf |
|----------------------------|------:|----------:|------------:|---------:|
| dedup_account | 1,284 | 412 | 736 | 136 |
| dedup_contact | 8,902 | 3,118 | 4,901 | 883 |
| orphan_contact | 1,071 | 1,071 | - | - |
| junk_lead | 942 | 942 | - | - |
| format_violation_phone | 4,820 | 4,820 | - | - |
| mismatch_account_contact | 612 | 612 | - | - |
| stage_violation | 217 | 217 | - | - |

Pick one to dry-run: `dry_run_fix(issue_id="dedup_account")`
```

Per-issue dry-run CSV (first three lines):

```csv
Id,Object,Operation,Field,Old_Value,New_Value,Confidence,Survivor_Id,Rationale,Approve
0011x00000ABCD1,Account,merge,,,,high,0011x00000WXYZ9,"E.164 phone match + same primary domain; survivor has 14 activities in last 90d vs 0",
0031x00000EFGH2,Contact,update,Email,john@oldmail.example,john@newmail.example,high,,Domain matches account website,
00Q1x00000IJKL3,Lead,soft_delete,,,,high,,Free-mail (gmail) in B2B segment + no activity 18mo,
```

Audit log entry shape (one row in `Cleanup_Audit__c` per write):

```json
{
"Operation": "merge",
"Target_Id": "0011x00000ABCD1",
"Field": null,
"Old_Value_JSON": "{\"Name\":\"Acme, Inc.\",\"Phone\":\"+1-415-555-0100\"}",
"New_Value_JSON": "{\"merged_into\":\"0011x00000WXYZ9\"}",
"Survivor_Id": "0011x00000WXYZ9",
"Run_Id": "run_2026-05-03_a8c1",
"Approver_User": "revops.lead@example.com",
"Timestamp": "2026-05-03T17:42:11Z"
}
```

## Watch-outs

- **Granting write access on the first run.** The first scan will surface false positives because it is also surfacing flaws in the dedup ruleset. Guard: the Skill refuses to run `apply_fix` if the configured token has write scope and `Cleanup_Audit__c` shows zero prior `Operation=dry_run` rows for the run's scope. Two read-only cycles minimum.
- **Account merge cascading to wrong records.** Account merges propagate to all child Opportunities, Contacts, Tasks, and Events. A wrong survivor destroys the wrong history. Guard: `apply_fix` for `dedup_account` rows refuses to run unless `sandbox=true` has been set on at least one prior run with the same `Run_Id` prefix in the last 14 days, and the operator has set `--rehearsed=true` explicitly.
- **Hard-deletes bypassing recycle bin.** Operators sometimes ask the Skill to "really delete" — to skip the recycle bin and free storage. Guard: the Skill has no hard-delete code path. `soft_delete` is the only delete operation; recycle-bin emptying is a manual platform action.
- **Reps waking up to merged accounts.** A clean dedup run with no comms burns trust faster than the bad data did. Guard: the Skill emits a `change_brief.md` alongside every applied run, listing the merge map (loser-Id → survivor-Id, owner email, count of moved opps) ready to paste into a Slack channel before reps log in.

# Dedup rules — TEMPLATE

> Replace this template's contents with your team's actual matching rules.
> The Skill reads this file on every run; without your real rules, the
> dedup proposals will be generic and produce a high false-positive rate
> on the first scan.

## Object scope

Which SObjects this ruleset applies to. The Skill runs the rules per object; rules can differ across Account / Contact / Lead.

- [ ] Account
- [ ] Contact
- [ ] Lead

## Deterministic match keys

Pass one of the dedup pipeline. Two records that match on any single key listed here are surfaced as `high` confidence dedup candidates.

### Account

| Key                       | Normalization                                  | Match strength |
|---------------------------|------------------------------------------------|----------------|
| Primary domain            | lowercase, strip `www.`, strip TLD aliases     | strong         |
| Billing phone             | E.164 normalize, drop extension                | strong         |
| Stripped name + country   | NFKD, lowercase, drop suffixes (Inc/LLC/GmbH)  | medium         |
| D-U-N-S number            | exact                                          | strong         |

### Contact

| Key                       | Normalization                                  | Match strength |
|---------------------------|------------------------------------------------|----------------|
| Email                     | lowercase, strip `+tag`, strip dots in gmail   | strong         |
| Mobile phone              | E.164 normalize                                | strong         |
| Full name + AccountId     | NFKD, lowercase, drop punctuation              | medium         |

### Lead

| Key                       | Normalization                                  | Match strength |
|---------------------------|------------------------------------------------|----------------|
| Email                     | lowercase, strip `+tag`                        | strong         |
| Phone                     | E.164 normalize                                | strong         |

## Fuzzy / semantic similarity threshold

Pass two of the pipeline. Only candidates that share at least one weak signal (same first 6 digits of phone, same first token of name, same parent-domain TLD) are submitted for semantic comparison.

| SObject | Cosine similarity threshold | Notes                                    |
|---------|-----------------------------|------------------------------------------|
| Account | 0.85                        | Lower threshold misses APAC/EU branches  |
| Contact | 0.90                        | Higher because false positives expensive |
| Lead    | 0.80                        | Lower because cleanup tolerates churn    |

## Disqualifiers — never propose merge

Even if rules match, do not propose merge if any of the following holds:

- Records belong to different `RecordType` (e.g. Customer vs Partner)
- Records are owned by different BU `Sales_Region__c` values
- One record has an active Opportunity in `Negotiation` or later — defer to rep review, do not auto-propose
- Records carry a `Do_Not_Merge__c = TRUE` flag set by an admin

## Last edited

{YYYY-MM-DD}

# Stage definitions — TEMPLATE

> Replace this template's contents with your team's actual Opportunity
> stage definition. The Skill uses this to flag rows in the
> `stage_violation` bucket — opportunities sitting in a stage without
> the fields that stage requires.

## Funnel stages

List every active value of the Opportunity `StageName` picklist, in funnel order. Mark which stages are open vs closed-won vs closed-lost.

| Order | StageName             | Type        | Probability % |
|------:|-----------------------|-------------|---------------:|
| 1     | Prospecting           | open        |              5 |
| 2     | Qualification         | open        |             15 |
| 3     | Discovery             | open        |             30 |
| 4     | Solutioning           | open        |             50 |
| 5     | Negotiation           | open        |             70 |
| 6     | Verbal Commit         | open        |             90 |
| 7     | Closed Won            | closed-won  |            100 |
| 8     | Closed Lost           | closed-lost |              0 |

## Required fields per stage

A row sitting in stage N must have all fields listed for N (and all earlier stages) populated. The Skill flags rows that violate this as `stage_violation` candidates.

### Qualification

- `AccountId` not null
- `Primary_Contact__c` not null
- `Lead_Source` not null

### Discovery

- All of Qualification, plus:
- At least 1 OpportunityContactRole with `IsPrimary = TRUE`
- `Pain_Point__c` not null
- `Compelling_Event__c` not null

### Solutioning

- All of Discovery, plus:
- `Decision_Criteria__c` not null
- `Decision_Process__c` not null
- At least 2 OpportunityContactRoles

### Negotiation

- All of Solutioning, plus:
- `Economic_Buyer_Identified__c = TRUE`
- `Mutual_Action_Plan_URL__c` not null

### Verbal Commit

- All of Negotiation, plus:
- `Procurement_Status__c` not null
- `Close_Date` within the next 45 days

### Closed Won

- All of Verbal Commit, plus:
- `Close_Date` not in the future
- `Amount` not null and greater than zero
- `Contract_URL__c` not null
- At least 1 `Quote` record with `Status = "Accepted"`

### Closed Lost

- `Close_Date` not in the future
- `Loss_Reason__c` not null
- `Competitor__c` not null when `Loss_Reason__c = "Competitor"`

## Skip-stage rules

Stages that may be legitimately skipped (the Skill does not flag a violation if the row passes through them):

- `Solutioning` may be skipped for inbound deals under $10k ACV
- `Verbal Commit` may be skipped for self-serve conversions

## Last edited

{YYYY-MM-DD}

# Survivor ranking — TEMPLATE

> Replace this template's contents with your team's actual weighting.
> The Skill uses these weights when proposing which record survives a
> merge. Defaults below reflect the bias "where the team is actually
> working today" rather than "who has the most history."

## Composite score

For each record in a duplicate pair, the Skill computes:

```
score = 0.40 * activity_recency_signal
      + 0.30 * contact_count_signal
      + 0.20 * opportunity_history_signal
      + 0.10 * not_integration_user_signal
```

Higher score wins survivor status. Ties default to the record with the lower (older) Salesforce Id, which is conventionally the canonical record.

## Signal definitions

### activity_recency_signal

- 1.0 if Tasks + Events in the last 30 days exist
- 0.7 if in the last 90 days
- 0.3 if in the last 365 days
- 0.0 otherwise

### contact_count_signal

- Normalized count of active Contacts attached to the Account. `min(count, 20) / 20`. Capped to avoid one record's bloat dominating.

### opportunity_history_signal

- 0.5 weight on count of Opportunities with `IsClosed = FALSE`
- 0.5 weight on log10(1 + sum of `Amount` across all Opportunities)
- Both halves min-max normalized across the candidate pair only

### not_integration_user_signal

- 1.0 if `LastModifiedById` is not in the integration-user allowlist
- 0.0 otherwise

The integration-user allowlist (set by RevOps admin):

- `dataloader@example.com.invalid`
- `marketo-sync@example.com.invalid`
- `outreach-sync@example.com.invalid`
- {add yours}

## Override columns

Even after the Skill proposes a survivor, RevOps can override per-row in the dry-run CSV. The columns the Skill respects on read-back:

- `survivor_override` — set to the Id you want as survivor
- `do_not_merge` — set to `Y` to drop this row from `apply_fix`
- `defer_to_owner_review` — set to `Y` to route to the rep instead

## Disqualifiers — never auto-propose a survivor

Skip survivor proposal entirely if any of:

- Both records have an open Opportunity in `Negotiation` or later — too much risk of breaking deal motion mid-flight
- Records have different Account Owners and one is in their first 30 days on the team (avoid trampling new-hire pipelines)
- One record carries a `Strategic_Account__c = TRUE` flag — kick to a RevOps human

## Last edited

{YYYY-MM-DD}