CNVRock

Pipeline

  • 1. Overview
    • Problem
    • Approach
    • Architecture
    • Why the scaling study
  • 2. Data acquisition
    • Why Aspera, not wget
    • Setup on HPC
    • Per-task pipeline
    • SLURM submission
    • Throughput
    • Compliance with HPC network policy
  • 3. Reference and intervals
    • HS11286_extended.fasta
    • 1 kb interval list
    • Plasmid gene coordinates
    • Mapping quality threshold
  • 4. Subset selection
    • Eligibility pipeline
    • Stratification
    • Nesting trick
    • Anchoring to in-flight data
    • Composition of the 5K seed set
  • 5. NPY stores
    • Chromosome NPY store
    • Plasmid-family NPY store
    • Build all four tiers
  • 6. Training
    • Entry point
    • SLURM wrapper
    • Config schema
    • Architecture: 1D Conv-VAE
    • Segmenter: Gaussian HMM
    • Output artefacts
  • 7. Evaluation
    • Ground truth
    • Metrics
    • Two-sided evaluation
    • Hold-out subset
    • Output

Manuscript

  • 8. Scaling study
    • Tiers
    • Headline results
      • Chromosomal blaSHV (extra-copy)
      • Plasmid genes (presence)
    • Reading the curve
    • Early observation (MQ=40 baseline, since superseded)
    • Reproducibility note
  • 9. Methods (parameter choices)
    • Hybrid mapping-quality thresholds: chromosome at MQ ≥ 20, plasmid at MQ = 0
      • Manuscript wording
    • Earlier MQ choice (legacy, MQ ≥ 0 single-pass)
      • Why MQ = 0
      • Why this is methodologically sound
      • Why gene-family aggregation, not individual genes
    • Concurrency cap (10 simultaneous ascp)
    • Stratification: species × Bla_Carb × ST cap
    • VAE β-warmup
    • CNV-pattern auxiliary loss
    • HMM 6 states with self-transition 0.80
    • Per-gene PCN thresholds
    • Chromosomal CRR thresholds
    • Reproducibility seed
  • 10. Reproducibility
    • Software
    • HPC environment
    • End-to-end recipe
    • Commit hashes
    • Random seeds
    • Storage
CNVRock
  • Search


© Copyright 2026, Louise Cerdeira et al..

Built with Sphinx using a theme provided by Read the Docs.