mfsig.com
System & cohort transparency

Status

Every figure below is read at build time from the actual artefact files in refs/ (audit + 3-gate seal + 3 True-OOD reports + glob counts). Regenerate with python refs/generate_status_inventory.py.

FORMAT SEALED · 2026-05-18

MolForge is the largest continuously-audited database of pure ab-initio thermodynamic descriptors for drug, polymer, and formulation discovery. 6 876 production fingerprints at v0.91.1 (wB97M-V / def2-SVP / C-PCM), feeding 8 published kernels under nested cross-validation plus 4 honest under-fits. Eleven audit rounds. Single source of truth, hash-pinned recipe, recipe and vendor patches SHA-validated on every compute.

Inventory generated 2026-06-03T10:30:00Z · schema mfsig/v0.91 (9-d canonical feature vector)

V3 ATLAS PILOT · IN PROGRESS · NOT CERTIFIED
sidecar · SEAL stores untouched

Quality-tier shadow sweep at v0.31-gold-svp — wB97M-V / def2-SVP + DF-K + C-PCM ε=80.

Ligand v0.31 mfsigs
4,529
drained · audited · ready
Pocket sprint
diverse-200
stratified 11 classes · chain auto-fires full 1,704 after
Hetero dimers (AAA)
90 × 8 SCFs
Boys-Bernardi BSSE-CP · Ed25519 · 5-pillar audit
Deep audit
1,013 / 1,013
pristine · JAX validated ±0.0005 kcal

Production kernels (Tm 2Fb-fusion v2.0, ASD miscibility v1.0, 2Fb-solubility v1.0) remain certified at v0.29-directqm-svp / v0.28.3. The v0.31-gold-svp sweep writes only to the isolated refs/atlas_v031_shadow/ + refs/pockets_v031_shadow/ + refs/dimers_v031_shadow/ stores; no kernel JSON is overwritten. True-OOD re-verification on v0.31 data is staged: ligand-side complete, pocket diverse-200 due ~22 h UTC tonight, dimer AAA pipeline streaming.

Live state: refs/_v3_shadow_queue.json + refs/_v3_pocket_fleet_queue.json + refs/_v3_dimer_fleet_queue.json + refs/_v3_pocket_chained_status.json · audits re-runnable via refs/_deep_audit_atlas_v031.py + refs/_v3_dimer_deep_audit.py

Pure B3LYP inventory

100 % B3LYP/def2-SVP · zero semi-empirical leakage

6,931
signed .mfsig files
Cohortn filesRecipeMethodologyPurpose
Production atlas v0.91.1 (wB97M-V/SVP)6,876v0.91.1wB97M-V / def2-SVP + C-PCM ε=80 + Bondi P1 (gpu4pyscf, SHA-pinned)Production drug atlas — feeds 8 nested-CV kernels + 4 honest under-fits + ESOL_logS v2 first feature extension
TZVPP head-to-head subset (v0.92.0, decision pending)55v0.92.0 (TZVPP)wB97M-V / def2-TZVPP + C-PCM (head-to-head subset)TZVPP basis-set head-to-head test against the SVP atlas — no production kernel uses it yet (decision pending after the paired Δŷ report).
Zero-Corruption audit
xtb (semi-empirical) leakage purge
Scanned
6,931
Deleted
0
Kept
6,931

Executed 2026-06-03T11:00:00Z

3-gate verification
refs/atlas_v031_v091_1
Gate A: B3LYP/def2-SVP purity · Gate B: charge conservation · Gate C: chem-aware moments
6,876 / 6,876
pass all three gates
FORMAT OFFICIALLY SEALED

True-OOD validations · strict held-out tests

Zero leakage between train and test · seeded splits for reproducibility · the only metric we publish to claim "no overfit".

Bradley_Tm 2-term family (cavity × sigma_variance, 14-pair plateau)
Nested-CV grouped-LOO by inchi_key_14 · 1183 anchors · round-9 plateau diagnostic
Production ship at the 14-pair family level. Round 11 surfaced a topological-complexity gap that ESOL_logS v2 captures kernel-specifically; Tm extension via lattice descriptors (Axis B) is the next motivated lever.
n train
1,183
n test
1,183
nested_cv_r
0.83
mae_K
41
mae_lift_vs_null_pct
46
OK
ASD miscibility χ (drug × polymer hetero kernel)
True-OOD strict drug-out (n=2 of 9), seed 20260518
Holds up under 2-drug-out — ranking quality preserved (Spearman ρ=0.75, top-1 hit 100%)
n train
n test
chi_mae
1.2
chi_rmse
1.434
chi_pearson_r
0.344
OK
ESOL_logS v2 (cavity + acceptor + NRotB, 2026-06-03 ship)
Nested-CV grouped-LOO 954 anchors · paired bootstrap Δr CI by drug · negative control kernel (MNSol_water) flat
ESOL_logS v2 SHIPS. First gated feature extension since the 9-d freeze. Topological-complexity term (NRotB) added at predict-time, kernel-specific to ESOL — the mfsig schema is unchanged.
n train
954
n test
954
nested_cv_r_2t_shipped
0.844
nested_cv_r_3t_v2
0.859
delta_r_paired
0.015
OK
Sealed recipe

mfsig/v0.91 (9-d canonical feature vector)

Canonical drug recipe
v0.91.1 (wB97M-V / def2-SVP + C-PCM ε=80 + Bondi P1 + Klamt-purge, via gpu4pyscf)
Legacy production recipe
Leakage policy
ZERO semi-empirical (xtb / GFN*) leakage allowed in production mfsig stores. ESOL_logS v2 ships an added topological-complexity term (NRotB) computed at predict-time — the mfsig schema and the other 7 kernels stay unchanged. All pre-v0.91.1 recipes are retired from the production write path.
See also: Audit · Roadmap · Changelog