# Token Ledger Repair

The billing team found quiet drift in a usage ledger. Some rows appear to have
token counts recorded in thousands of tokens while the billing writer priced
them as raw tokens. One later row may also have been priced with a stale ledger
region instead of the provider region.

Read these files:

- `/app/provider_events.csv`: source-of-truth usage events from the metering provider
- `/app/billing_ledger.csv`: the current warehouse ledger that may contain unit drift
- `/app/rate_card.csv`: model prices per million input and output tokens
- `/app/region_multipliers.csv`: regional billing multipliers
- `/app/ingestion_notes.md`: deployment notes from the ingestion pipeline

Create two files:

1. `/app/fix.sql`
2. `/app/incident_report.json`

`fix.sql` must be a SQLite script. When it is run in a database containing the
four CSV tables, it must create a table named `corrected_ledger` with one row per
provider event and these columns:

- `event_id`: string
- `input_tokens`: integer raw token count
- `output_tokens`: integer raw token count
- `amount_usd`: real number rounded to 6 decimal places
- `correction_reason`: short string explaining whether the row was corrected or
  left unchanged

Use the provider events as the source of truth for raw token counts and region.
Price each row as:

```text
((input_tokens / 1_000_000) * input_price_per_million
 + (output_tokens / 1_000_000) * output_price_per_million)
 * region_multiplier
```

`incident_report.json` must be valid JSON with:

- `affected_events`: array of event IDs whose corrected row differs from the
  current billing ledger
- `root_cause`: short string
- `total_delta_usd`: number rounded to 6 decimal places
- `checks`: array of validation checks you used
- `risk_level`: one of `medium`, `high`

The verifier gives partial credit for diagnosis, so make the explanations
specific:

- If a row had thousand-token/unit drift, mention `unit`, `thousand`, `raw`,
  `scaled`, or `drift` in that row's `correction_reason`.
- If a row had a stale region or multiplier, mention `region`, `multiplier`,
  `cache`, or `enrichment` in that row's `correction_reason`.
- If a row did not change, say it was unchanged, matched the source, or was OK.
- In `root_cause`, mention both the token-unit problem and the stale region
  enrichment/cache problem.

Do not edit the input CSV files.
