Correctness — the v0.28 DFT path under-represented HB donors and over-counted polarisation on large flexible polymers due to a missing COSMO charge-conservation renormalization. v0.28.1 enforces ∫σ·dA = −kεps · Q_solute via the standard area-weighted uniform-σ shift used by every production COSMO code, applied against the segment-set the downstream pipeline actually keeps (not the full Lebedev grid). HB donor mass is restored, |ΣQ_seg| < 0.05 e for every entry.
Benchmark gap-close — every .mfsig.json now carries five additional fields. The production-recommended ΔG number is g_polar_bias_corrected_kcal_mol (raw + universal +0.511 kcal/mol shift) — MAE 0.87 on 18 anchors, rank #6/12. The 5-param translator (g_total_kcal_mol_translator) is exposed for research/SAR but is NOT recommended on unknown drugs: its LOO MAE is 1.23 (worse than raw 0.945) because it overfits the 15-anchor calibration cohort. Train past ~100 anchors before trusting any multi-parameter ΔG correction. Atlas χ distribution is now physically meaningful: range [−0.01, +1.11], mean +0.22, 27% χ-pass cells (vs the broken cohort's 92.5% which was an artefact of degenerate σ-profiles).
Cohort coverage + known limits
20 / 20 drugs PASS 7/7 gates. 19 / 20 polymers PASS 7/7 gates. HPMCAS_L (123 atoms, def2-SVP) is in mfsig_deferred_queue.jsonl awaiting Heavy-Track tier (high-memory tier or CPU fallback). HB correlations: donor↔HBD r=+0.65, acceptor↔HBA r=+0.72 (PASS). Known limits: (a) χ kernel produces near-zero magnitudes on absolute cavity areas — screening-only, awaiting per-anchor K_chi refit. (b) σ-invariant T01 (tighter than the 0.05 e cohort gate) shows a size-scaling charge residual on large polymers (8/19 polymer all-pass). Note on LVPP: we do NOT treat LVPP-DFT as ground truth — it publishes no geometries, no cavity construction details, and no σ-moments; cohort r² vs LVPP = 0.20 even after our best DFT relaxation (structural, not numerical; see refs/BENCHMARK_AND_COMPETITORS.md §4). The MolForge cohort .mfsig.json — with DFT-relaxed geometries, explicit cavity + bin edges, σ-moments, SHA-256 audit — is the reproducible replacement reference.
Provenance + σ-cache validation
FreeSolv anchor file: database.txt v0.52, SHA-256 prefix 2d13f095713bc39b, 639 entries. Cohort match: 15 / 20 drugs (paracetamol, diazepam, amlodipine, atorvastatin, water absent from v0.52). Fit split: 12 train / 3 held-out test, seeded. Direct σ-cache cosine validation against the production reference: caffeine 0.86, diazepam 0.95 — the cleanest direct confirmation that the corrected v028 σ-profiles agree with production. Previously this metric was 0/87 ≥ 0.90.