Anatomy of a .mfsig
Five pillars. Every parameter declared, typed, gate-checked, and tagged with a confidence band. What's in the file is what you can defend.
mfsig/v0.91.1· generated by MFFactory v0.91.1, the unified DFT-first pipelinev0.91.1 unifies the five pre-v29 SCF pipelines (drug atlas, pocket fragments, dimer BSSE-CP, hetero pair, drug-as-pocket) into a single source of truth: molforge.MFFactory.compute(). Every mfsig field is a first-principles DFT observable — no Klamt empirical thresholds, no CPCM-X spatial averaging, no external-package calibration constants. Each file carries a 4-way provenance stamp (mf_factory_version · vendor_tree_sha256 · registry_recipe_sha256 · image_tag) so a reader twenty years from now can reconstruct the exact compute context byte-for-byte.
Legacy schemas v2.5-multivariant and earlier are kept readable for back-compat (their archived files retain their original signatures); all new mfsigs ship in v0.91.1 from MFFactory.
The functional itself — not the basis-set fallback chain, not UHF-broken-symmetry — is the load-bearing fix. Range-separation + built-in VV10 NLC place electrons in the correct LUMO on charged-pocket dimers that B3LYP folded into closed-shell solutions with vanishing gap.
A .mfsig.json is a signed, schema-validated, version-locked snapshot of a single molecule's σ-profile, cavity, energy, and provenance — packaged so that any reader, twenty years from now, can re-verify every claim against the bytes without contacting us. Five pillars below; each field is typed, gate-checked, and tagged with a confidence band.
File family
Three .mfsig formats sharing one Ed25519 chain — all now produced by the unified MFFactory v0.91.1 orchestrator. Each extends the previous; same audit, same renderer, same 4-way provenance stamp.
.mfsig.jsonmonomerOne drug, σ-profile + cavity + atomic charges. Used by COSMO-RS, solubility, ΔG_solv kernel. Generated by MFFactory.compute_drug().
.mfsig.assembled.jsonself-dimerDrug × drug homodimer. ΔE_cohesion + H-bond / π-stack descriptors. Used for melting-point + crystal cohesion. Generated by MFFactory.compute_dimer_bsse_cp().
.mfsig.hetero_dimer.jsondrug × polymerDrug × polymer-segment heterodimer. ΔE_interaction + Flory-Huggins χ. Used for ASD miscibility screening. Generated by MFFactory.compute_hetero_pair().
audit_and_trust
Cryptographic identity + 4-way MFFactory provenance. Tamper-evident, ALCOA-compatible, byte-for-byte reproducible.
| Field | Type | Conf. | What it means |
|---|---|---|---|
sha256_checksum | string · hex64 | HIGH | SHA-256 of the canonicalised payload. Recomputed on load. Mismatch = file rejected. |
uuid | string · uuidv4 | HIGH | Stable identifier minted at write time. Travels with the file forever. |
timestamp_utc | ISO 8601 | HIGH | UTC moment the .mfsig was sealed. |
molecule_name | string | HIGH | Trivial name from the request — for humans, not a unique key. |
mf_factory_version | string | HIGH | Version of the unified compute pipeline, e.g. "0.91.1". Pins the orchestrator + SCF fallback chain + sigma-moment formulas. First leg of the 4-way provenance stamp. |
vendor_tree_sha256 | string · hex64 | HIGH | SHA-256 of the vendored gpu4pyscf tree at compute time (patched C-PCM + ghost-exclusion). Asserts that the exact patched binary that produced the numbers can be re-materialised. Second leg of the provenance stamp. |
registry_recipe_sha256 | string · hex64 | HIGH | SHA-256 of the entry in MF_FACTORY_REGISTRY.json under versions.0.91.1 — the immutable recipe (functional / basis / dispersion / radii / sigma-moment formulas / SCF policy). Third leg. |
image_tag | string | HIGH | Docker image tag + sha256 of the worker that ran the SCF (e.g. worker:v29 @ sha256:2d45d796…). Fourth leg — locks the OS, CUDA, and dependency layer. |
image_patch_chain | string[] | HIGH | Ordered list of patch markers baked into the image (Klamt-purge, ghost-exclusion P1, C-PCM L2bis cavity fix, wB97M-V default, …). Lets a future reader reconstruct WHY this image differs from upstream gpu4pyscf. |
reproducibility | object | HIGH | engine version + build hash + library lock — replay anywhere. Redundant with the 4-way stamp above; kept for legacy readers. |
lineage | object | HIGH | Append-only history of every derivation event. tier, generation, parent_sha256, genesis_sha256, history[] — a git for σ-profiles. Each entry references the mf_factory_version that produced it. |
ownership | object | HIGH | license_id, license_type, organization, signing_key_id, molforge_signature (Ed25519). Public key at /.well-known/mfsig-keys.json. |
recipe | object | HIGH | Snapshot of the named recipe ("mf-factory-v0.91.1"): functional = wB97M-V, basis = def2-SVP (def2-TZVPP L3 fallback), dispersion = built-in VV10 NLC, radii = Bondi (P1-patched), counterpoise = Boys-Bernardi where applicable. Also carries validation_at_compute (anchor set + MAE + n_molecules). |
derived_grade | string | HIGH | MF1 / MF2 / MF3 / MF4 — the customer-facing grade derived from the recipe and the validation context. |
chemistry_and_geometry
What the molecule is, where every atom sits.
| Field | Type | Conf. | What it means |
|---|---|---|---|
smiles | string · canonical | HIGH | RDKit-canonical SMILES of the input. Lossless round-trip with the structure. |
inchi_key_14 | string · 14-char | HIGH | Structure-derived hash. Use this as the unique molecule key, not the name. |
atomic_numbers | int[N] | HIGH | Element list. N matches positions array exactly. |
positions_aa | float[N][3] | HIGH | Cartesian coordinates in Ångström, relaxed geometry. ‖∇E‖ < 1 mEh/Bohr at write time. |
n_heavy / n_total | int | HIGH | Heavy-atom and total-atom counts — used for tier pricing and benchmark cohorts. |
quantum_and_thermodynamics
The physics: density, surface, energy, σ-signature. v0.91.1 MFFactory generation — pure DFT-derived, zero Klamt empirical thresholds.
| Field | Type | Conf. | What it means |
|---|---|---|---|
scf.converged | bool | HIGH | Energy gate. False = file rejected at sealing time, never shipped. |
scf.method | string | HIGH | L0=DIIS, L1=DIIS+SOSCF (Newton), L2=LS=0.5+coarse grid, L3=DAMP+SOSCF. v0.91.1 production : 96/96 drug SCFs converge at L0. |
scf.fallback_level | int (0-3) | HIGH | Which level of the L0→L3 fallback chain converged. 0 = clean DIIS. Higher = harder system. |
scf.cycles / scf.wall_s | int / float | HIGH | SCF iteration count + wall-clock seconds for forensic audit. |
energies.total_hartree | float | HIGH | Converged DFT total energy E[ρ]. |
energies.g_polar_hartree | float | HIGH | C-PCM polarization energy = mf.with_solvent.e (raw DFT, canonical). Always negative for neutral closed-shell drugs. |
energies.homo_hartree | float | HIGH | Highest occupied MO energy (occupation > 1e-6). v0.91.1 / wB97M-V : 0/96 drugs show pathological gap collapse. |
energies.lumo_hartree | float | HIGH | Lowest unoccupied MO energy. Strictly above HOMO under wB97M-V on this cohort. |
energies.gap_ev | float | HIGH | (lumo − homo) × 27.2114. v0.91.1 production range across 96 drugs : +5.7 → +13.9 eV. Negative gaps → file rejected. |
moments.dipole_debye | float[4] | MED | (μx, μy, μz, ‖μ‖) from converged density via mf.dip_moment(unit='Debye'). |
moments.atomic_charges_mulliken | float[N] | MED | Mulliken-partition atomic charges via diag(dm @ S). N matches atom count. |
cavity_and_sigma.cavity_area_aa2 | float | HIGH | Σ area_i over all surface tiles, mf.with_solvent.surface['area']. Bondi vdW radii after P1 ghost-exclusion patch. |
cavity_and_sigma.n_segments | int | HIGH | Number of Connolly surface tiles (typically ~3000-11000 depending on drug size). |
cavity_and_sigma.sigma_per_segment | float[Ns] | HIGH | Raw σ_i = q_sym_i / area_i per surface tile (e/Ų). Preserved verbatim — downstream can re-compute any moment without regen. |
cavity_and_sigma.segment_areas | float[Ns] | HIGH | Per-tile area in Ų; pairs 1-to-1 with sigma_per_segment. |
cavity_and_sigma.sigma_moments | object | HIGH | Area-weighted moments of RAW σ (no Klamt cutoff, no r_av averaging): { mean, variance, skewness, kurtosis }. Formulas in MF_FACTORY_REGISTRY.json versions.0.91.1.sigma_moments_formula. No HB-donor/acceptor mass split — the kernel discovers HB-like signal from variance + skewness. |
ai_intelligence_layer
Pre-computed features ML pipelines can consume directly.
| Field | Type | Conf. | What it means |
|---|---|---|---|
embedding_vector | float[D] | MED | Fixed-length molecular embedding suitable for retrieval / similarity / classification heads. |
feature_provenance | object | HIGH | Which feature extractor + version produced the embedding. Frozen, replayable. |
downstream_tasks_supported | string[] | HIGH | Declared compatibility list — ΔG-solv · γ∞ · χ12 · SLE · LLE. |
legacy_vault
One-way bridge to incumbent σ-profile formats — for portability.
| Field | Type | Conf. | What it means |
|---|---|---|---|
cosmotherm_profile | string | null | HIGH | COSMOtherm-compatible σ-profile block. Drop-in replacement for legacy pipelines. |
turbomole_cosmo | string | null | HIGH | Turbomole .cosmo file body. Lossless re-export when source format permits. |
openCOSMO_rs | string | null | HIGH | openCOSMO-RS feed. For groups that already have an open ASD workflow. |
vendor_provenance | object | HIGH | Which incumbent each block targets, version range, known caveats. |
.mfsig.assembled.jsonhomodimer · mfsig/v0.91.1 · MFFactory
Drug × drug self-cohesion. Inherits all five monomer pillars (audit · chemistry · quantum · AI · legacy) for both copies, then adds the assembled_data block with the cohesion energy and dimer-only descriptors. Used for melting-point prediction, crystal cohesion screening, and π-stack / H-bond network analysis.
Field (under assembled_data) | Type | Conf. | What it means |
|---|---|---|---|
monomer_inchi_key_14 | string · 14-char | HIGH | InChI-key of the monomer being self-assembled — matches the chemistry_and_geometry pillar of the underlying .mfsig.json. |
dimer_geometry_aa | float[N×2][3] | HIGH | Cartesian coordinates of both A copies at the relaxed dimer geometry, Ångström. xtb_relax pre-step then DFT single-points (no PySCF gradient — bypasses geomeTRIC + PCM pathology). |
n_scfs | int · always 3 | HIGH | Boys-Bernardi symmetric counterpoise: SCF on isolated A, SCF on isolated A′ (ghost-A geometry), SCF on AB dimer. Three independent SCF cycles per artefact. |
E_monomer_a_hartree | float | HIGH | Total DFT energy of isolated copy A at the dimer geometry. |
E_monomer_b_hartree | float | HIGH | Total DFT energy of isolated copy A′ at the dimer geometry. |
E_dimer_hartree | float | HIGH | Total DFT energy of the AB pair at the relaxed dimer geometry. |
delta_e_cohesion_kcal_mol | float | HIGH | ΔE = E_dimer − E_a − E_b, converted to kcal/mol. The cohesion observable. BSSE-corrected via the counterpoise scheme. |
delta_e_bsse_kcal_mol | float | HIGH | Basis-set superposition error correction, separately reported for forensic audit. Always ≥ 0 by construction. |
hb_network | object | HIGH | H-bond network: pairs of (donor_atom_idx, acceptor_atom_idx, distance_aa, angle_deg, type) classified by geometric criteria. Counts: hb_donor_count, hb_acceptor_count, hb_total. |
pi_stack_descriptor | object | MED | π-stack analysis on aromatic ring centroids: stacked / T-shaped / parallel-displaced + centroid distances. Only emitted when the monomer has ≥ 1 aromatic ring. |
contact_area_aa2 | float | HIGH | Connolly surface contact area between A and B at the dimer geometry — the touching surface where cohesion is mediated. |
assembled_provenance | object | HIGH | Recipe used (mf-factory-v0.91.1), xc_functional = wB97M-V, basis = def2-SVP (def2-TZVPP L3 fallback), dispersion = built-in VV10 NLC, counterpoise_scheme = Boys-Bernardi, dimer_relax_method = xtb_relax + DFT single-points, methodology_hash + 4-way MFFactory provenance stamp. Frozen at write time. |
lineage_link | object | HIGH | Parent monomer .mfsig SHA-256 (parent_sha256) so the dimer is verifiably derived from a signed monomer. The lineage chain runs monomer → assembled, never the other direction. |
assembled_provenance.counterpoise_scheme..mfsig.hetero_dimer.jsondrug × polymer · mfsig/v0.91.1 · MFFactory
Drug × polymer-segment interaction for amorphous-solid-dispersion (ASD) miscibility prediction. Inherits both partners' monomer pillars, then adds hetero_interactionwith the asymmetric Boys-Bernardi counterpoise output plus the Flory-Huggins χ regression. Dual-Track storage: the raw unbounded χ is the canonical AI-internal field; a separate tanh-bounded χ_display is emitted ONLY for the frontend viewer.
Field (under hetero_interaction) | Type | Conf. | What it means |
|---|---|---|---|
drug_inchi_key_14 | string · 14-char | HIGH | InChI-key of the drug partner. |
polymer_segment_smiles | string | HIGH | SMILES of the polymer monomeric segment used in the pair (e.g. PVPVA-vinyl-acetate, Soluplus-vinylcaprolactam-co-PEG, HPMCAS-cellulose-acetate-succinate). The segment is documented and matched against the polymer registry. |
polymer_segment_mw_da | float | HIGH | Molar mass of the segment monomer in Daltons — needed by Flory-Huggins χ to convert ΔE_interaction (per pair) to per-lattice-site χ. |
dimer_geometry_aa | float[Na+Nb][3] | HIGH | Cartesian of the relaxed AB heterodimer. |
n_scfs | int · always 5 | HIGH | Asymmetric Boys-Bernardi: (1) A clean, (2) B clean, (3) AB dimer, (4) A in dimer-basis with ghost-B, (5) B in dimer-basis with ghost-A. Five independent SCF cycles. |
E_drug_clean_hartree | float | HIGH | DFT energy of the drug at the dimer geometry, monomer basis only. |
E_polymer_clean_hartree | float | HIGH | DFT energy of the polymer segment at the dimer geometry, monomer basis only. |
E_dimer_hartree | float | HIGH | DFT energy of the drug + polymer-segment dimer at the relaxed geometry. |
E_drug_in_dimer_basis | float | HIGH | DFT energy of the drug in the full dimer basis (with ghost-B atoms). Used to subtract BSSE attributable to the basis the drug sees in the dimer context. |
E_polymer_in_dimer_basis | float | HIGH | Mirror of the above for the polymer segment. |
delta_e_interaction_kcal_mol | float | HIGH | ΔE_interaction = E_dimer − E_drug_clean − E_polymer_clean, kcal/mol. The interaction observable. BSSE-corrected via the asymmetric counterpoise. |
delta_e_bsse_kcal_mol | float | HIGH | Basis-set superposition error for forensic audit, separately reported (always ≥ 0). |
flory_huggins_chi | float | HIGH | Raw, unbounded χ at 298 K. The CANONICAL AI-internal field — every downstream MolForge pipeline reads this value. Calibrated by closed-form ridge regression against the ASD miscibility cohort (90 hetero-dimères, 9 drugs × 10 polymères pharma, B3LYP+BSSE) — LODO ranking top-1 78%, top-3 89%, Spearman ρ +0.76 vs Hansen HSP ~30% top-1. NO Hansen HSP heuristics, NO empirical group contributions. |
flory_huggins_chi_display | float | HIGH | tanh-bounded copy of χ clipped to [−3, +3] — emitted ONLY for the frontend Viewer's chip rows. Never trained on. Distinct field name so downstream consumers cannot confuse the two. |
asd_miscibility_tag | string · enum | MED | Human-readable classification: good / moderate / poor / phase_separating. Derived from χ ranges with the boundary justifications recorded in the recipe registry. |
contact_area_aa2 | float | HIGH | Connolly surface contact area between drug and polymer-segment at the dimer geometry. |
hb_network | object | HIGH | H-bond network across the interface: donor-on-drug × acceptor-on-polymer pairs, plus the inverse. |
chi_calibration_provenance | object | HIGH | ASD miscibility cohort SHA (refs/asd_miscibility_kernel.json), anchor count (n=90 hetero-dimères, 9 drugs × 10 polymères), ridge λ, LODO ranking metrics (top-1 78%, top-3 89%, Spearman ρ +0.76), feature set (8-d RDKit drug fingerprint + 10-d polymer one-hot). Frozen at calibration time. |
hetero_provenance | object | HIGH | Recipe = mf-factory-v0.91.1, xc_functional = wB97M-V, basis = def2-SVP (def2-TZVPP L3 fallback), dispersion = built-in VV10 NLC, counterpoise_scheme = 'asymmetric_bb', dimer_relax_method = 'xtb_relax + DFT_single_point', methodology_hash + 4-way MFFactory provenance stamp. |
lineage_link | object | HIGH | Parent drug .mfsig.json SHA-256 + parent polymer-segment .mfsig.json SHA-256 → the heterodimer is verifiably derived from two signed monomers. |
flory_huggins_chi is what every downstream MolForge AI pipeline trains on — we preserve the full quantum signal. The tanh-clipped flory_huggins_chi_display exists only so frontend chip rows show a value within the legacy pharma [−3, +3] band. The two are separately named so a consumer can never accidentally pipe display values back into training.chi_calibration_provenance sur chaque fichier. Source: refs/asd_miscibility_kernel.json.How .mfsig compares
Every format that touches σ-profile or molecular property pipelines, side-by-side. ✓ first-class · ◐ partial / extractable · ✗ absent. The .mfsig column is what you take home.
| Capability | .mfsig this spec | .cosmo Turbomole | CTD COSMOtherm | OpenCOSMO-RS feed | .fchk / .log Gaussian | ORCA out ORCA | NwChem out NwChem | xtb out xtb | .sdf MDL · V2000 | .mol MDL · V2000 | .xyz atoms only | .pdb RCSB | SMILES string | InChI string |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| coverage | 25/25 | 6/25 +3◐ | 3/25 +4◐ | 3/25 +2◐ | 7/25 +2◐ | 6/25 +4◐ | 6/25 +4◐ | 6/25 +4◐ | 5/25 | 5/25 | 3/25 | 4/25 +2◐ | 4/25 | 4/25 +1◐ |
Identity Connectivity (SMILES / structure) | ✓ | ✗ | ✗ | ✗ | ◐ | ◐ | ◐ | ◐ | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ |
| Stable structure hash (InChI Key) | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Atomic numbers + positions (Å) | ✓ | ✓ | ◐ | ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Bond table | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ◐ | ✓ | ◐ |
σ-content σ-profile histogram | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Per-segment σ-surface (position · normal · area · charge) | ✓ | ◐ | ✗ | ◐ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| σ-moments (mean · var · skew · kurt) | ✓ | ✗ | ◐ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| HB donor / acceptor mass | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Charge conservation gate (∫σ·dA ≈ 0) | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
Quantum Converged SCF energy | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| g_polar (COSMO solvation free energy) | ✓ | ✓ | ✓ | ◐ | ✗ | ◐ | ◐ | ◐ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Bias-corrected ΔG (FreeSolv-anchored) | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Direct DFT dipole archived | ✓ | ✗ | ✗ | ✗ | ✓ | ◐ | ◐ | ◐ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Atomic charges (Mulliken) | ✓ | ◐ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Forces ∂E/∂R at the reported geometry | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
Trust SHA-256 audit seal | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 21 CFR Part 11 / ALCOA-compatible | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| UUID + ISO timestamp | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ◐ | ✗ | ✗ |
| Method + engine version pinned in file | ✓ | ◐ | ◐ | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Reproducibility metadata (lib lock) | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
AI Embedding vector (ML feature store) | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Declared downstream tasks (γ∞ · χ12 · ΔG) | ✓ | ✗ | ◐ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
Interop Legacy vault — re-exports other vendor formats | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Open spec · no vendor licence required | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Single-file deliverable (no companion needed) | ✓ | ✓ | ✓ | ✓ | ◐ | ◐ | ◐ | ◐ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Sources: published format specs for Turbomole .cosmo, COSMOtherm CTD, Gaussian .fchk/.log, ORCA, NwChem, xtb, ChemDraw / MDL SDF (V2000 / V3000), Protein Data Bank PDB, InChI / SMILES strings, OpenCOSMO-RS feed. Comparison is for the data persisted to a single deliverable file — not for what the originating tool can compute.
One change in v0.92.0: basis: def2-SVP → def2-TZVPP
wB97M-V was parameterised against def2-TZVPP (Mardirossian & Head-Gordon, 2016). v0.91.1 ran on def2-SVP as a pragmatic v28-era speed compromise. v0.92.0 flips the default to TZVPP — the basis the functional was calibrated for — and bumps the L3 fallback to def2-QZVPP. Everything else is unchanged: same wB97M-V, same C-PCM ε=80, same Bondi (P1) radii, same raw-σ moments, same 4-way provenance stamp. This is the only recipe knob that moves.
refs/MF_FACTORY_REGISTRY.json versions.0.92.0 (SHA cd88bda5…) · full pipeline diagram + audit commands at /factory.What changed from v2.5-multivariant to v0.91.1
The five pre-v29 SCF pipelines (drug atlas, pocket fragments, dimer BSSE-CP, hetero pair, drug-as-pocket) collapse into one unified entry point in src/molforge/mf_factory.py. Same SCF fallback chain, same vendor guard, same provenance stamp on every artefact.
Removed σ_HB threshold (0.0084 e/Ų), r_av spatial averaging (0.537 Å), hb_donor_mass / hb_acceptor_mass / polar_area_aa2 / halogen_mass empirical fields. The new sigma_moments block stores mean / variance / skewness / kurtosis of the raw σ_i = q_sym_i / area_i — the kernel discovers HB-like signal directly from the moments.
Functional flipped from B3LYP-D3BJ → wB97M-V (range-separated + built-in VV10 NLC). On the v29 sealed-holdout: 10/10 oracle pairs converge at L0 with healthy gaps (+1.4 → +3.1 eV), where B3LYP had 3/10 with negative HOMO-LUMO gap. Corroborated by 96/96 drug SCFs L0-clean. 146/146 wB97M-V SCFs total.
Every artefact now carries mf_factory_version · vendor_tree_sha256 · registry_recipe_sha256 · image_tag. Reading a file ten years from now reconstructs the exact compute context — Python source, vendored gpu4pyscf binary, recipe immutable, OS + CUDA + dependency layer — byte-for-byte.
The append-only lineage from v2.4 is unchanged: parent_sha256 → new_sha256 + history[]. Each entry now carries the producing mf_factory_version so a tier upgrade or recipe enrichment can be traced to the exact MFFactory call that produced it.
cavity_and_sigma.sigma_moments (raw moments) instead of the legacy chemistry-aware split.{
"mfsig_version": "0.91.1",
"audit_and_trust": {
"sha256_checksum": "7e1f744c77d25dee78515d29ee505f55...",
"uuid": "b61705e5-b992-4a7a-8a3e-7762155cc178",
"timestamp_utc": "2026-05-27T18:42:11Z",
"molecule_name": "BAXOFTOLAUCFNW",
"mf_factory_version": "0.91.1",
"vendor_tree_sha256": "9c1a4f88b0e7d3a2c5e6f1b9d8a7e4f3...",
"registry_recipe_sha256": "fd2d7890ded98d3af5b577f46c28fdcb...",
"image_tag": "worker:v29 @ sha256:2d45d796275ef25e840b373cba842a87...",
"image_patch_chain": [
"klamt-purge:no-sigma-hb-threshold",
"klamt-purge:no-rav-spatial-averaging",
"ghost-exclusion:P1-bondi-vdw",
"pcm-cavity:L2bis-fix",
"scf-default:wB97M-V/def2-SVP"
],
"lineage": {
"tier": "Platinum",
"generation": 1,
"parent_sha256": null,
"genesis_sha256": "7e1f744c77d25dee78515d29ee505f55...",
"history": [
{ "op": "genesis",
"tier": "Platinum",
"tool": "MFFactory.compute_drug::v0.91.1",
"new_sha256": "7e1f744c..." }
]
},
"ownership": {
"license_type": "internal-rd",
"organization": "molforge.ai R&D",
"signing_key_id": "molforge-dev-2026",
"molforge_signature": "boAdABa/LLHnDa7Rb05TddTi6qKIzacSTgDx..."
},
"recipe": {
"name": "mf-factory-v0.91.1",
"functional": "wB97M-V",
"basis": "def2-SVP",
"dispersion": "built-in VV10 NLC",
"radii": "Bondi (P1-patched)",
"methodology_hash": "fd2d7890ded98d3af5b577f46c28fdcb...",
"validation_at_compute": {
"anchor_set": "v29-sealed-holdout-2026-05-27",
"MAE_dG_kcal_mol": 1.18,
"n_molecules": 96
}
},
"derived_grade": "MF3"
},
"chemistry_and_geometry": {
"smiles": "CC(C)N(C(C)C)C(=O)C...",
"inchi_key_14": "BAXOFTOLAUCFNW",
"atomic_numbers": [6, 6, 6, 7, ...],
"positions_aa": [[...], ...],
"n_heavy": 24,
"n_total": 38
},
"quantum_and_thermodynamics": {
"scf": {
"converged": true,
"method": "L0:DIIS",
"fallback_level": 0,
"cycles": 18,
"wall_s": 42.7
},
"energies": {
"total_hartree": -1234.5678,
"g_polar_hartree": -0.0163,
"homo_hartree": -0.2412,
"lumo_hartree": -0.0118,
"gap_ev": 6.24
},
"moments": {
"dipole_debye": [1.21, -0.84, 2.06, 2.55],
"atomic_charges_mulliken": [-0.31, 0.12, ...]
},
"cavity_and_sigma": {
"cavity_area_aa2": 711.42,
"n_segments": 4128,
"sigma_per_segment": [...],
"segment_areas": [...],
"sigma_moments": {
"mean": 4.79e-05,
"variance": 4.32e-05,
"skewness": -0.214,
"kurtosis": 3.067
}
}
},
"ai_intelligence_layer": {
"embedding_vector": [...],
"feature_provenance": { "extractor": "molforge.kernel.v29-9d", "stamp": "..." }
}
}physics_variants wrapper (single canonical SCF per file), no hb_donor_mass / hb_acceptor_mass / polar_area_aa2 / halogen_mass (Klamt-tainted, removed), no sigma_hb_threshold: 0.0084 (purged), no sigma_profile_101 binned histogram (raw per-segment σ is preserved instead).