# 8. Scaling study

The scaling study is the manuscript's main empirical result. **Same
architecture, same hyperparameters, same evaluation** — only the training-set
size varies.

## Tiers

| Experiment | Tier | n samples | Manifest |
|---|---|---|---|
| (Phase 1 baseline) | 545 | 545 | (legacy) |
| exp 32 | 5K | 5,000 | `assets/kpsc_expansion_subset_5k.tsv` |
| exp 33 | 10K | 10,000 | `assets/kpsc_expansion_subset_10k.tsv` |
| exp 34 | 20K | 20,000 | `assets/kpsc_expansion_subset_20k.tsv` |
| exp 36 | 40K | 40,000 | `assets/kpsc_expansion_subset_40k.tsv` |

Each manifest is a **strict superset** of the previous one — see
{doc}`04_subset_selection`.

## Headline results

```{note}
Results table populates as experiments complete. Currently scheduled:
exp 32–34 will retrain at MQ=20 once the 20K download finishes
(~3 days from 2026-05-17). Exp 36 follows after.
```

### Chromosomal blaSHV (extra-copy)

| Tier | MCC | FNR | PPV | call_rate | n_eval |
|---|---|---|---|---|---|
| 545 (Phase 1) | — | — | — | — | — |
| 5K | — | — | — | — | — |
| 10K | — | — | — | — | — |
| 20K | — | — | — | — | — |
| 40K | — | — | — | — | — |

### Plasmid genes (presence)

For each tier we report per-gene MCC across {`blaKPC`, `blaCTX-M`, `blaNDM`,
`blaOXA-48`, `qnrB1`, `aac6-Ib-cr`}.

| Tier | blaKPC | blaCTX-M | blaNDM | blaOXA-48 | qnrB1 | aac6-Ib-cr |
|---|---|---|---|---|---|---|
| 5K | — | — | — | — | — | — |
| 10K | — | — | — | — | — | — |
| 20K | — | — | — | — | — | — |
| 40K | — | — | — | — | — | — |

## Reading the curve

We expect:

- **Monotonic improvement in MCC** for under-represented genes (rare STs,
  rare plasmid carriages) as the training set grows.
- **Plateau** at some n* between 10K and 40K — that's the per-gene
  saturation point.
- For genes already at MCC ≈ 0.9+ at 545 samples (e.g. blaKPC in the Phase 1
  cohort), expect **little to no headroom** — diminishing returns from
  scaling. Scaling matters most for the long tail.

## Early observation (MQ=40 baseline, since superseded)

The initial exp 32 / 33 runs were done at MQ=40 — too strict for the
multi-mapping plasmid reads in the extended reference (see
{doc}`09_methods`). With those broken-plasmid results we already saw a
**striking jump in chromosomal blaSHV `call_rate`**:

| Tier | blaSHV call_rate (MQ=40 baseline) |
|---|---|
| 5K | 0.44 |
| 10K | 0.91 |

The signal is there — the rerun at MQ=20 should produce comparable
chromosomal numbers plus working plasmid detection.

## Reproducibility note

Every experiment's `evaluation.txt` is committed to `data/results/{exp}/`.
The commit hash that produced each result is recorded in the experiment
log. See {doc}`10_reproducibility` for the exact recipe to regenerate.