An n8n flow that pulls outside-counsel invoices from your e-billing system, parses LEDES 1998B line items, applies your billing guidelines as deterministic rules, asks Claude for a second pass on anomalies that resist rules (duplicative timekeepers, scope creep, off-engagement-letter work), then routes each invoice into one of four buckets — auto-approve, auto-deduct with notice, Slack reviewer queue, or director escalation — with every decision written to an idempotent audit log. Recovers the 5-15% of outside-counsel spend that line-by-line manual review misses, at the cost of about $0.04 of Claude inference per invoice.
The complete workflow ships in apps/web/public/artifacts/legal-spend-anomaly-n8n/legal-spend-anomaly-n8n.json (15 nodes, single trigger). Setup notes and credential instructions live in the sibling _README.md.
When to use
You have a steady volume of outside-counsel invoices — at least 50 a month across more than three firms — flowing through an e-billing system that exposes LEDES via API (Brightflag, Onit, BusyLamp, SimpleLegal, or a self-hosted equivalent). You have written billing guidelines and a rate card per firm, and someone on the team is already doing line-item review well enough that you can validate the flow’s flags against their catches. The win is shifting that reviewer from “scan every line” to “decide on the flagged items,” which typically lands at three to five times the throughput per reviewer hour.
When NOT to use
Skip this if your invoice volume is under twenty a month — the calibration overhead exceeds the recoverable spend. Skip it if you do not have a rate card and approved-timekeeper list per matter; the flow leans on those tables for the rule-based checks, and without them the AI pass is doing all the work and will hallucinate violations. Skip it if your firms send PDF invoices only; this flow assumes LEDES, and the PDF-extraction variant is a different workflow with much weaker recall. Skip it if your legal-ops function is one person who reviews everything personally and trusts their own pattern recognition more than they would trust a tuned model — in that case the flow adds latency without adding judgement.
Setup
The flow assumes four supporting Postgres tables (matters, matter_approved_timekeepers, firm_billing_guidelines, invoice_audit_log) — the README spells out the columns and the indexes that make the upserts and the watermark cheap. Stand those up first, populate them from your existing matter-management system or rate-card spreadsheets, then import legal-spend-anomaly-n8n.json into n8n. Wire the four placeholder credentials (Brightflag/your e-billing system, Postgres, Anthropic, Slack) per the README. Run the six-step verification sequence in the README before flipping the cron trigger to active; do not skip the idempotency check, since a duplicate audit-log row will throw off the next watermark.
Calibration is the part most teams underweight. Pull a hundred historical invoices your team has already reviewed manually, run them through the flow with the cron disabled, and compare the flow’s decision against your team’s actual disposition. Expect to retune the AI system prompt in Claude — Anomaly Detection and the thresholds in Score + Route at least twice before the routing distribution looks like your team’s. The thresholds in the bundle are starting points (AI severity ≥ 0.8 escalates, rule value share ≥ 15% escalates, AI flag count > 0 routes to the reviewer queue) — they will move once you see your distribution.
What the flow does
Daily Cron — 7am Mon-Fri fires the run. Lookup Watermark reads the most recent checked_at from invoice_audit_log and falls back to seven days if the table is empty, so re-runs after an outage do not double-process. Brightflag — List New Invoices queries the e-billing system for invoices submitted since the watermark; Split Invoices fans out one execution per invoice. Fetch LEDES File downloads the LEDES 1998B blob and Parse LEDES (a Code node) splits it into structured line items — timekeeper id, classification, rate, units, task code, activity code, narrative, line total. Load Matter + Rate Card pulls the matter, the approved-timekeeper list with rate caps, and the firm’s billing guidelines in a single round-trip.
Rule-Based Checks is a deterministic pass: it flags unapproved timekeepers, rates above the card, block-billing (units above the firm’s threshold with a short narrative), vague descriptions matching the firm’s keyword list, and partner-classified travel time when the firm’s no-travel-class rule applies. Each flag carries a severity (0-1) and an estimated dollar impact, summed into rule_value_cents. Claude — Anomaly Detection then makes a single Anthropic API call against claude-sonnet-4-6 with the line items, the matter scope, and the firm guidelines as context, returning a JSON array of findings the rules cannot easily express — duplicative timekeepers on the same task on the same day, time disproportionate to scope, scope-creep narrative, off-engagement-letter work. The system prompt explicitly forbids inventing line indexes or claiming violations not tied to a specific line, which is the most common failure mode of LLM-based invoice review.
Score + Route combines the two passes into a single decision. The four buckets — auto_approve, auto_deduct, reviewer_queue, escalate_director — are routed via two if nodes. Escalations land in #legal-ops-escalations with a Slack Block Kit payload showing the top five rule and AI findings; reviewer-queue and auto-deduct decisions land in #legal-ops-invoice-review; auto-approve writes the audit log only. Every branch terminates at Audit Log Insert, which upserts on invoice_id so re-runs are safe.
Cost reality
Per invoice: one Claude Sonnet 4.6 call at roughly 4-6k input tokens (line items + matter + guidelines) and 500-1000 output tokens, so about $0.04 each at current pricing. At 500 invoices a month that is around $20 of inference. The Postgres queries are cheap (single-row reads on indexed columns plus one upsert). The e-billing API and LEDES fetch are free side of your existing vendor contract. n8n self-hosted is the linear-fixed cost; n8n Cloud Starter at $24/month covers this volume with room to spare.
The labour math is what makes this pay back. A reviewer doing line-by-line takes 10-15 minutes per invoice; the flow drops that to 2-4 minutes on the queued items (read the Slack summary, click into the audit log, decide), and zero on the auto-approve and auto-deduct paths. At 500 invoices a month with a 60/30/10 split across auto-approve, reviewer queue, and escalation, the flow saves roughly 50 hours of reviewer time a month against an inference cost of $20 plus an hour or two of operator time tuning thresholds. The recovered spend itself is the bigger line: 5-15% of monthly outside-counsel spend is the band reported in vendor case studies (Brightflag, Onit) and our own back-tests, and that swamps the operating cost by two orders of magnitude on any portfolio above $200k/month.
Be honest about the time-to-payback. The first month is calibration, not recovery. Months two and three are when the routing distribution stabilises and the recovered spend starts showing up in your AP variance.
Success metric
Track recovered spend per month — the dollar value of auto_deduct plus the dollar value of reviewer-confirmed deductions out of the queue, divided by total outside-counsel spend that month. The number to beat is whatever your manual baseline was. If the flow is not pulling at least 3% in month three you have a calibration problem, not a flow problem; pull the audit log, sample 30 invoices, and compare against your team’s manual notes.
Secondary metric: reviewer time per flagged invoice. If it is climbing instead of falling, the Slack messages are not giving the reviewer enough context to decide quickly — adjust the Block Kit payload in Slack — Reviewer Queue to include the specific line numbers and dollar deltas, not just the flag categories.
vs alternatives
Versus the e-billing vendor’s built-in compliance engine (Brightflag’s “AI review”, Onit’s rules engine): the vendor’s rules are competent but their AI pass is opaque, you cannot tune the prompt, and you cannot add custom checks without paying for a professional-services engagement. This flow gives you the prompt, the thresholds, and the audit log — all editable. Versus a DIY Python script: same logic, much higher operational burden (you own the cron, the retries, the credential rotation, the observability) and no visual debugger when a LEDES file from a new firm parses oddly. Versus the status quo of a paralegal reading every invoice: the paralegal is more accurate on novel patterns for the first month, after which the flow’s recall on the codified rules is higher and the paralegal’s time is freed for the genuinely judgement-call items.
The case for the n8n version specifically over a Lambda or a Make.com build is the visual graph plus the per-node retry semantics — when the Anthropic API rate-limits you on a busy morning, n8n’s automatic retry with backoff on the httpRequest node handles it without code, and you can see the retry happen.
Watch-outs
Auto-deductions communicated badly damage firm relationships. Guard: the Slack — Reviewer Queue payload always includes the reasoning chain from both the rule pass and the AI pass, and the audit log retains the full rule_flags_json and ai_flags_json. Before any auto-deduction is communicated to the firm, generate the firm-facing note from the audit log row, not from a templated “we deducted X” message — firms accept reductions when they see the specific line, the specific guideline, and the specific dollar impact.
Threshold tuning is matter-type-sensitive. Litigation invoices have different patterns (large discovery batches look like block billing but are not) than transactional ones (any block billing is suspicious). Guard: the Load Matter + Rate Card query returns matter_type, and the Rule-Based Checks Code node is the place to branch on it. Ship the v1 flow with global thresholds, then specialise within four weeks.
Novel firms produce false positives until you have a baseline. Guard: add a WHERE invoices_seen_count < 5 check upstream and force decision = reviewer_queue for any firm under that threshold, regardless of what the rules and AI say. The bundle does not include this check by default; add it before going live if you onboard new firms more than once a quarter.
LEDES parsing breaks silently when a firm sends a malformed file. Guard: the Parse LEDES Code node returns parse_error: 'empty_or_malformed_ledes' rather than throwing, and the downstream nodes will write a row to the audit log with decision: auto_approve (the default) — which is wrong. Add an if node after Parse LEDES that routes parse errors to #legal-ops-escalations with the firm name and invoice id so a human can chase the firm for a clean file.
Claude can hallucinate violations on a busy invoice. Guard: the system prompt forbids inventing line indexes; the Score + Route node treats AI findings as advisory unless severity ≥ 0.8 (escalation) or AI count > 0 alongside rule findings (reviewer queue). Never let an AI-only flag drive an auto_deduct.
Stack
n8n (cloud or self-hosted) is the orchestrator. Claude Sonnet 4.6 via the Anthropic Messages API does the anomaly pass. Postgres holds the matter database, rate cards, billing guidelines, and audit log. Slack receives the reviewer queue and director escalations. Your e-billing system (Brightflag in the bundle defaults; swap the host and path for Onit, BusyLamp, SimpleLegal, or a self-hosted endpoint) is the source of truth for new invoices and the eventual write-back target if you extend the flow to push deductions back rather than emailing them.
This flow is the operational layer of legal spend management; the policy layer is your written outside-counsel guidelines, which the rule-based checks encode. The two only work together — the guidelines without the flow are aspirational; the flow without the guidelines is a model trying to invent your policy.
# Outside-counsel invoice anomaly detection (n8n)
## What this flow does
Polls your e-billing system every weekday morning for newly submitted outside-counsel invoices, fetches the LEDES 1998B file for each one, parses every line item, runs deterministic billing-guideline checks against your matter database (approved timekeepers, rate cards, block-billing rules, vague-description keywords, no-travel-class rules), then asks Claude for a second pass over anomalies that are hard to express as rules (duplicative timekeepers on the same task, disproportionate task time relative to scope, scope-creep narrative, off-engagement-letter work). Each invoice is scored, routed to one of four buckets — auto-approve, auto-deduct with notice, reviewer queue in Slack, or director escalation — and written to an idempotent audit log.
The flow is single-trigger (the daily cron); the watermark on `invoice_audit_log.checked_at` makes re-runs safe. Every decision is reproducible from the audit log row.
## Import
1. In your n8n instance, open **Workflows → Import from File** and select `legal-spend-anomaly-n8n.json`.
2. The workflow imports as inactive. Do not activate it yet — you need to wire credentials and create the supporting Postgres tables first.
3. Open workflow **Settings** and confirm `executionOrder: v1` and `timezone: America/New_York` (or change the timezone to match your billing day boundary). The `Daily Cron — 7am Mon-Fri` node inherits this timezone.
## Credentials
The workflow ships with four placeholder credential references. Each must be replaced with a real credential in n8n before the flow runs. In each node, open the credential picker and either select an existing credential of the right type or create a new one.
### `PLACEHOLDER_BRIGHTFLAG_CRED_ID` — Brightflag (or your e-billing system) API token
Used by the `Brightflag — List New Invoices` and `Fetch LEDES File` nodes. Type: **Header Auth**. Header name: `Authorization`. Header value: `Bearer <your-token>`. If you are on Onit, BusyLamp, SimpleLegal, or a self-hosted e-billing system, swap the host and path in the `Brightflag — List New Invoices` node URL and adjust the header to whatever your vendor expects. The downstream `Parse LEDES` and `Rule-Based Checks` nodes assume the list endpoint returns `{ invoices: [{ id, firm_id, matter_id, ledes_url, total_amount, currency }] }`; if your vendor's shape differs, add a `Code` node after the list call to normalise.
### `PLACEHOLDER_POSTGRES_CRED_ID` — Postgres for matter database + audit log
Used by `Lookup Watermark`, `Load Matter + Rate Card`, and `Audit Log Insert`. Type: **Postgres**. The flow expects four tables: `matters` (matter_id, matter_type, budget_remaining_cents, scope_summary), `matter_approved_timekeepers` (matter_id, timekeeper_id, max_rate_cents, classification), `firm_billing_guidelines` (law_firm_id, block_billing_min_units, vague_keywords text[], after_hours_window, no_travel_class text[]), and `invoice_audit_log` (id serial pk, invoice_id unique, plus the columns the `Audit Log Insert` node writes). Add a unique index on `invoice_audit_log.invoice_id` so the `ON CONFLICT` clause works, and indexes on `matter_approved_timekeepers.matter_id` and `firm_billing_guidelines.law_firm_id`.
### `PLACEHOLDER_ANTHROPIC_CRED_ID` — Anthropic API key
Used by `Claude — Anomaly Detection`. Type: **Header Auth**. Header name: `x-api-key`. Header value: your Anthropic API key. The node targets `claude-sonnet-4-6`; switch to a smaller model only after you have calibrated against historical invoices, since the recall on subtle scope-creep narratives degrades quickly with cheaper models.
### `PLACEHOLDER_SLACK_CRED_ID` — Slack bot token
Used by `Slack — Escalate to Director` and `Slack — Reviewer Queue`. Type: **Header Auth**. Header name: `Authorization`. Header value: `Bearer xoxb-...`. The bot needs `chat:write` and must be invited into both `#legal-ops-escalations` and `#legal-ops-invoice-review` (or whatever channels you rename them to in the two Slack node bodies).
## First-run verification
Before you flip the schedule trigger to active, walk every branch on a small set of inputs.
1. **Empty list path.** Temporarily edit the `Brightflag — List New Invoices` URL to query a status that returns no invoices. Run the workflow manually. Expected: `Split Invoices` produces zero items, the rest of the flow short-circuits, and no rows appear in `invoice_audit_log`.
2. **Clean invoice path.** Pick a known-clean historical invoice (no rate breaches, all timekeepers on the approved list, no vague descriptions). Run the workflow manually with that invoice's `ledes_url` injected. Expected: `Score + Route` returns `decision: auto_approve`; one row in `invoice_audit_log` with `rule_flag_count = 0` and `ai_flag_count = 0`.
3. **Rule-only flag path.** Pick an invoice where you know one timekeeper billed slightly above the rate card. Expected: `decision: auto_deduct` with `reason: low_value_rule_flags_only`, the `Reviewer or Deduct?` node routes to the audit log directly, no Slack message goes out (or change the `Slack — Reviewer Queue` body to also handle `auto_deduct` if you prefer notice).
4. **AI-flag path.** Run a historical invoice your team manually flagged for scope creep. Expected: `decision: reviewer_queue` and a Slack message in `#legal-ops-invoice-review` with both rule and AI findings. Cross-check the AI findings against your team's manual notes; if Claude is missing the same items your team caught, tighten the system prompt before going further.
5. **Escalation path.** Run the most egregious historical invoice you have (large overrun, off-scope work). Expected: `decision: escalate_director` and a Slack message in `#legal-ops-escalations`. Confirm the `:rotating_light:` block format renders correctly.
6. **Idempotency.** Re-run any of the above with the same invoice. Expected: the existing `invoice_audit_log` row is updated in place (the `ON CONFLICT (invoice_id) DO UPDATE` clause), not duplicated. The watermark advances correctly on the next scheduled run.
Once all six branches behave as expected, activate the workflow. The `Daily Cron — 7am Mon-Fri` node will then drive everything from there. Watch the audit log for the first two weeks; expect to retune the AI system prompt and the `Score + Route` thresholds at least twice before the routing distribution stabilises.