Status
Every figure below is read at build time from the actual artefact files in refs/ (audit + 3-gate seal + 3 True-OOD reports + glob counts). Regenerate with python refs/generate_status_inventory.py.
MolForge is the largest continuously-audited database of pure ab-initio thermodynamic descriptors for drug, polymer, and formulation discovery. 6 876 production fingerprints at v0.91.1 (wB97M-V / def2-SVP / C-PCM), feeding 8 published kernels under nested cross-validation plus 4 honest under-fits. Eleven audit rounds. Single source of truth, hash-pinned recipe, recipe and vendor patches SHA-validated on every compute.
Inventory generated 2026-06-03T10:30:00Z · schema mfsig/v0.91 (9-d canonical feature vector)
Quality-tier shadow sweep at v0.31-gold-svp — wB97M-V / def2-SVP + DF-K + C-PCM ε=80.
Production kernels (Tm 2Fb-fusion v2.0, ASD miscibility v1.0, 2Fb-solubility v1.0) remain certified at v0.29-directqm-svp / v0.28.3. The v0.31-gold-svp sweep writes only to the isolated refs/atlas_v031_shadow/ + refs/pockets_v031_shadow/ + refs/dimers_v031_shadow/ stores; no kernel JSON is overwritten. True-OOD re-verification on v0.31 data is staged: ligand-side complete, pocket diverse-200 due ~22 h UTC tonight, dimer AAA pipeline streaming.
Live state: refs/_v3_shadow_queue.json + refs/_v3_pocket_fleet_queue.json + refs/_v3_dimer_fleet_queue.json + refs/_v3_pocket_chained_status.json · audits re-runnable via refs/_deep_audit_atlas_v031.py + refs/_v3_dimer_deep_audit.py
Pure B3LYP inventory
100 % B3LYP/def2-SVP · zero semi-empirical leakage
| Cohort | n files | Recipe | Methodology | Purpose |
|---|---|---|---|---|
| Production atlas v0.91.1 (wB97M-V/SVP) | 6,876 | v0.91.1 | wB97M-V / def2-SVP + C-PCM ε=80 + Bondi P1 (gpu4pyscf, SHA-pinned) | Production drug atlas — feeds 8 nested-CV kernels + 4 honest under-fits + ESOL_logS v2 first feature extension |
| TZVPP head-to-head subset (v0.92.0, decision pending) | 55 | v0.92.0 (TZVPP) | wB97M-V / def2-TZVPP + C-PCM (head-to-head subset) | TZVPP basis-set head-to-head test against the SVP atlas — no production kernel uses it yet (decision pending after the paired Δŷ report). |
Executed 2026-06-03T11:00:00Z
True-OOD validations · strict held-out tests
Zero leakage between train and test · seeded splits for reproducibility · the only metric we publish to claim "no overfit".
mfsig/v0.91 (9-d canonical feature vector)
- Canonical drug recipe
- v0.91.1 (wB97M-V / def2-SVP + C-PCM ε=80 + Bondi P1 + Klamt-purge, via gpu4pyscf)
- Legacy production recipe
- Leakage policy
- ZERO semi-empirical (xtb / GFN*) leakage allowed in production mfsig stores. ESOL_logS v2 ships an added topological-complexity term (NRotB) computed at predict-time — the mfsig schema and the other 7 kernels stay unchanged. All pre-v0.91.1 recipes are retired from the production write path.