v0.28.1 — corrected σ-profiles + honest benchmark gap-close

Three correctness fixes (charge conservation, NaN-safe moments, segment-set filter) plus three benchmark-gap fields (direct DFT dipole, bias-shift, 5-param translator). HONEST OOD on FreeSolv ΔG-hydration: rank #6/12 with MAE 0.87 kcal/mol using raw DFT + universal +0.51 bias shift. Translator's LOO MAE is 1.23 (worse than raw 0.945) — research-only, not for production. ⚠ HISTORICAL: this documents the v0.28.1 release (April 2026). Current production is v0.28.3 with MAE 0.85 kcal/mol and +1.71 bias shift on 28 FreeSolv anchors — see PROJECT_SNAPSHOT and the /benchmarks page.

Correctness — the v0.28 DFT path under-represented HB donors and over-counted polarisation on large flexible polymers due to a missing COSMO charge-conservation renormalization. v0.28.1 enforces ∫σ·dA = −kεps · Q_solute via the standard area-weighted uniform-σ shift used by every production COSMO code, applied against the segment-set the downstream pipeline actually keeps (not the full Lebedev grid). HB donor mass is restored, |ΣQ_seg| < 0.05 e for every entry.

Benchmark gap-close — every .mfsig.json now carries five additional fields. The production-recommended ΔG number is g_polar_bias_corrected_kcal_mol (raw + universal +0.511 kcal/mol shift) — MAE 0.87 on 18 anchors, rank #6/12. The 5-param translator (g_total_kcal_mol_translator) is exposed for research/SAR but is NOT recommended on unknown drugs: its LOO MAE is 1.23 (worse than raw 0.945) because it overfits the 15-anchor calibration cohort. Train past ~100 anchors before trusting any multi-parameter ΔG correction. Atlas χ distribution is now physically meaningful: range [−0.01, +1.11], mean +0.22, 27% χ-pass cells (vs the broken cohort's 92.5% which was an artefact of degenerate σ-profiles).

Cohort coverage + known limits

20 / 20 drugs PASS 7/7 gates. 19 / 20 polymers PASS 7/7 gates. HPMCAS_L (123 atoms, def2-SVP) is in mfsig_deferred_queue.jsonl awaiting Heavy-Track tier (high-memory tier or CPU fallback). HB correlations: donor↔HBD r=+0.65, acceptor↔HBA r=+0.72 (PASS). Known limits: (a) χ kernel produces near-zero magnitudes on absolute cavity areas — screening-only, awaiting per-anchor K_chi refit. (b) σ-invariant T01 (tighter than the 0.05 e cohort gate) shows a size-scaling charge residual on large polymers (8/19 polymer all-pass). Note on LVPP: we do NOT treat LVPP-DFT as ground truth — it publishes no geometries, no cavity construction details, and no σ-moments; cohort r² vs LVPP = 0.20 even after our best DFT relaxation (structural, not numerical; see refs/BENCHMARK_AND_COMPETITORS.md §4). The MolForge cohort .mfsig.json — with DFT-relaxed geometries, explicit cavity + bin edges, σ-moments, SHA-256 audit — is the reproducible replacement reference.

Provenance + σ-cache validation

FreeSolv anchor file: database.txt v0.52, SHA-256 prefix 2d13f095713bc39b, 639 entries. Cohort match: 15 / 20 drugs (paracetamol, diazepam, amlodipine, atorvastatin, water absent from v0.52). Fit split: 12 train / 3 held-out test, seeded. Direct σ-cache cosine validation against the production reference: caffeine 0.86, diazepam 0.95 — the cleanest direct confirmation that the corrected v028 σ-profiles agree with production. Previously this metric was 0/87 ≥ 0.90.