mfsig.com
The signed file, every field

Anatomy of a .mfsig

Five pillars. Every parameter declared, typed, gate-checked, and tagged with a confidence band. What's in the file is what you can defend.

current schema (production)mfsig/v0.91.1· generated by MFFactory v0.91.1, the unified DFT-first pipeline

v0.91.1 unifies the five pre-v29 SCF pipelines (drug atlas, pocket fragments, dimer BSSE-CP, hetero pair, drug-as-pocket) into a single source of truth: molforge.MFFactory.compute(). Every mfsig field is a first-principles DFT observable — no Klamt empirical thresholds, no CPCM-X spatial averaging, no external-package calibration constants. Each file carries a 4-way provenance stamp (mf_factory_version · vendor_tree_sha256 · registry_recipe_sha256 · image_tag) so a reader twenty years from now can reconstruct the exact compute context byte-for-byte.

Legacy schemas v2.5-multivariant and earlier are kept readable for back-compat (their archived files retain their original signatures); all new mfsigs ship in v0.91.1 from MFFactory.

headline science · v0.91.1wB97M-V solves the deep-gap pathology that B3LYP couldn't
v28 (B3LYP)
3 of 10 oracle pairs had NEGATIVE HOMO-LUMO gap. BSSE_PCM range −29.6 → +34.9 kcal/mol with multiple state-flips.
v29 (wB97M-V, MFFactory)
10/10 pairs converge at L0 (no fallback). Gap range +1.4 → +3.1 eV (all healthy). BSSE in ±3.4 kcal/mol.
corroborated independently
96/96 drug SCFs (incl. 31 formally charged) all L0-converged with 0 negative gaps. 146/146 wB97M-V SCFs clean.

The functional itself — not the basis-set fallback chain, not UHF-broken-symmetry — is the load-bearing fix. Range-separation + built-in VV10 NLC place electrons in the correct LUMO on charged-pocket dimers that B3LYP folded into closed-shell solutions with vanishing gap.

What this file is, in 30 seconds

A .mfsig.json is a signed, schema-validated, version-locked snapshot of a single molecule's σ-profile, cavity, energy, and provenance — packaged so that any reader, twenty years from now, can re-verify every claim against the bytes without contacting us. Five pillars below; each field is typed, gate-checked, and tagged with a confidence band.

Confidence HIGH · gate-checked, deterministic MED · documented systematic limit LOW · research only

File family

Three .mfsig formats sharing one Ed25519 chain — all now produced by the unified MFFactory v0.91.1 orchestrator. Each extends the previous; same audit, same renderer, same 4-way provenance stamp.

.mfsig.jsonmonomer

One drug, σ-profile + cavity + atomic charges. Used by COSMO-RS, solubility, ΔG_solv kernel. Generated by MFFactory.compute_drug().

mfsig/v0.91.1 · wB97M-V/def2-SVP
.mfsig.assembled.jsonself-dimer

Drug × drug homodimer. ΔE_cohesion + H-bond / π-stack descriptors. Used for melting-point + crystal cohesion. Generated by MFFactory.compute_dimer_bsse_cp().

mfsig/v0.91.1 · 3-SCF Boys-Bernardi
.mfsig.hetero_dimer.jsondrug × polymer

Drug × polymer-segment heterodimer. ΔE_interaction + Flory-Huggins χ. Used for ASD miscibility screening. Generated by MFFactory.compute_hetero_pair().

mfsig/v0.91.1 · 5-SCF asymmetric CP
I

audit_and_trust

Cryptographic identity + 4-way MFFactory provenance. Tamper-evident, ALCOA-compatible, byte-for-byte reproducible.

FieldTypeConf.What it means
sha256_checksumstring · hex64 HIGHSHA-256 of the canonicalised payload. Recomputed on load. Mismatch = file rejected.
uuidstring · uuidv4 HIGHStable identifier minted at write time. Travels with the file forever.
timestamp_utcISO 8601 HIGHUTC moment the .mfsig was sealed.
molecule_namestring HIGHTrivial name from the request — for humans, not a unique key.
mf_factory_versionstring HIGHVersion of the unified compute pipeline, e.g. "0.91.1". Pins the orchestrator + SCF fallback chain + sigma-moment formulas. First leg of the 4-way provenance stamp.
vendor_tree_sha256string · hex64 HIGHSHA-256 of the vendored gpu4pyscf tree at compute time (patched C-PCM + ghost-exclusion). Asserts that the exact patched binary that produced the numbers can be re-materialised. Second leg of the provenance stamp.
registry_recipe_sha256string · hex64 HIGHSHA-256 of the entry in MF_FACTORY_REGISTRY.json under versions.0.91.1 — the immutable recipe (functional / basis / dispersion / radii / sigma-moment formulas / SCF policy). Third leg.
image_tagstring HIGHDocker image tag + sha256 of the worker that ran the SCF (e.g. worker:v29 @ sha256:2d45d796…). Fourth leg — locks the OS, CUDA, and dependency layer.
image_patch_chainstring[] HIGHOrdered list of patch markers baked into the image (Klamt-purge, ghost-exclusion P1, C-PCM L2bis cavity fix, wB97M-V default, …). Lets a future reader reconstruct WHY this image differs from upstream gpu4pyscf.
reproducibilityobject HIGHengine version + build hash + library lock — replay anywhere. Redundant with the 4-way stamp above; kept for legacy readers.
lineageobject HIGHAppend-only history of every derivation event. tier, generation, parent_sha256, genesis_sha256, history[] — a git for σ-profiles. Each entry references the mf_factory_version that produced it.
ownershipobject HIGHlicense_id, license_type, organization, signing_key_id, molforge_signature (Ed25519). Public key at /.well-known/mfsig-keys.json.
recipeobject HIGHSnapshot of the named recipe ("mf-factory-v0.91.1"): functional = wB97M-V, basis = def2-SVP (def2-TZVPP L3 fallback), dispersion = built-in VV10 NLC, radii = Bondi (P1-patched), counterpoise = Boys-Bernardi where applicable. Also carries validation_at_compute (anchor set + MAE + n_molecules).
derived_gradestring HIGHMF1 / MF2 / MF3 / MF4 — the customer-facing grade derived from the recipe and the validation context.
II

chemistry_and_geometry

What the molecule is, where every atom sits.

FieldTypeConf.What it means
smilesstring · canonical HIGHRDKit-canonical SMILES of the input. Lossless round-trip with the structure.
inchi_key_14string · 14-char HIGHStructure-derived hash. Use this as the unique molecule key, not the name.
atomic_numbersint[N] HIGHElement list. N matches positions array exactly.
positions_aafloat[N][3] HIGHCartesian coordinates in Ångström, relaxed geometry. ‖∇E‖ < 1 mEh/Bohr at write time.
n_heavy / n_totalint HIGHHeavy-atom and total-atom counts — used for tier pricing and benchmark cohorts.
III

quantum_and_thermodynamics

The physics: density, surface, energy, σ-signature. v0.91.1 MFFactory generation — pure DFT-derived, zero Klamt empirical thresholds.

FieldTypeConf.What it means
scf.convergedbool HIGHEnergy gate. False = file rejected at sealing time, never shipped.
scf.methodstring HIGHL0=DIIS, L1=DIIS+SOSCF (Newton), L2=LS=0.5+coarse grid, L3=DAMP+SOSCF. v0.91.1 production : 96/96 drug SCFs converge at L0.
scf.fallback_levelint (0-3) HIGHWhich level of the L0→L3 fallback chain converged. 0 = clean DIIS. Higher = harder system.
scf.cycles / scf.wall_sint / float HIGHSCF iteration count + wall-clock seconds for forensic audit.
energies.total_hartreefloat HIGHConverged DFT total energy E[ρ].
energies.g_polar_hartreefloat HIGHC-PCM polarization energy = mf.with_solvent.e (raw DFT, canonical). Always negative for neutral closed-shell drugs.
energies.homo_hartreefloat HIGHHighest occupied MO energy (occupation > 1e-6). v0.91.1 / wB97M-V : 0/96 drugs show pathological gap collapse.
energies.lumo_hartreefloat HIGHLowest unoccupied MO energy. Strictly above HOMO under wB97M-V on this cohort.
energies.gap_evfloat HIGH(lumo − homo) × 27.2114. v0.91.1 production range across 96 drugs : +5.7 → +13.9 eV. Negative gaps → file rejected.
moments.dipole_debyefloat[4] MED(μx, μy, μz, ‖μ‖) from converged density via mf.dip_moment(unit='Debye').
moments.atomic_charges_mullikenfloat[N] MEDMulliken-partition atomic charges via diag(dm @ S). N matches atom count.
cavity_and_sigma.cavity_area_aa2float HIGHΣ area_i over all surface tiles, mf.with_solvent.surface['area']. Bondi vdW radii after P1 ghost-exclusion patch.
cavity_and_sigma.n_segmentsint HIGHNumber of Connolly surface tiles (typically ~3000-11000 depending on drug size).
cavity_and_sigma.sigma_per_segmentfloat[Ns] HIGHRaw σ_i = q_sym_i / area_i per surface tile (e/Ų). Preserved verbatim — downstream can re-compute any moment without regen.
cavity_and_sigma.segment_areasfloat[Ns] HIGHPer-tile area in Ų; pairs 1-to-1 with sigma_per_segment.
cavity_and_sigma.sigma_momentsobject HIGHArea-weighted moments of RAW σ (no Klamt cutoff, no r_av averaging): { mean, variance, skewness, kurtosis }. Formulas in MF_FACTORY_REGISTRY.json versions.0.91.1.sigma_moments_formula. No HB-donor/acceptor mass split — the kernel discovers HB-like signal from variance + skewness.
IV

ai_intelligence_layer

Pre-computed features ML pipelines can consume directly.

FieldTypeConf.What it means
embedding_vectorfloat[D] MEDFixed-length molecular embedding suitable for retrieval / similarity / classification heads.
feature_provenanceobject HIGHWhich feature extractor + version produced the embedding. Frozen, replayable.
downstream_tasks_supportedstring[] HIGHDeclared compatibility list — ΔG-solv · γ∞ · χ12 · SLE · LLE.
V

legacy_vault

One-way bridge to incumbent σ-profile formats — for portability.

FieldTypeConf.What it means
cosmotherm_profilestring | null HIGHCOSMOtherm-compatible σ-profile block. Drop-in replacement for legacy pipelines.
turbomole_cosmostring | null HIGHTurbomole .cosmo file body. Lossless re-export when source format permits.
openCOSMO_rsstring | null HIGHopenCOSMO-RS feed. For groups that already have an open ASD workflow.
vendor_provenanceobject HIGHWhich incumbent each block targets, version range, known caveats.
VI

.mfsig.assembled.jsonhomodimer · mfsig/v0.91.1 · MFFactory

Drug × drug self-cohesion. Inherits all five monomer pillars (audit · chemistry · quantum · AI · legacy) for both copies, then adds the assembled_data block with the cohesion energy and dimer-only descriptors. Used for melting-point prediction, crystal cohesion screening, and π-stack / H-bond network analysis.

Field (under assembled_data)TypeConf.What it means
monomer_inchi_key_14string · 14-char HIGHInChI-key of the monomer being self-assembled — matches the chemistry_and_geometry pillar of the underlying .mfsig.json.
dimer_geometry_aafloat[N×2][3] HIGHCartesian coordinates of both A copies at the relaxed dimer geometry, Ångström. xtb_relax pre-step then DFT single-points (no PySCF gradient — bypasses geomeTRIC + PCM pathology).
n_scfsint · always 3 HIGHBoys-Bernardi symmetric counterpoise: SCF on isolated A, SCF on isolated A′ (ghost-A geometry), SCF on AB dimer. Three independent SCF cycles per artefact.
E_monomer_a_hartreefloat HIGHTotal DFT energy of isolated copy A at the dimer geometry.
E_monomer_b_hartreefloat HIGHTotal DFT energy of isolated copy A′ at the dimer geometry.
E_dimer_hartreefloat HIGHTotal DFT energy of the AB pair at the relaxed dimer geometry.
delta_e_cohesion_kcal_molfloat HIGHΔE = E_dimer − E_a − E_b, converted to kcal/mol. The cohesion observable. BSSE-corrected via the counterpoise scheme.
delta_e_bsse_kcal_molfloat HIGHBasis-set superposition error correction, separately reported for forensic audit. Always ≥ 0 by construction.
hb_networkobject HIGHH-bond network: pairs of (donor_atom_idx, acceptor_atom_idx, distance_aa, angle_deg, type) classified by geometric criteria. Counts: hb_donor_count, hb_acceptor_count, hb_total.
pi_stack_descriptorobject MEDπ-stack analysis on aromatic ring centroids: stacked / T-shaped / parallel-displaced + centroid distances. Only emitted when the monomer has ≥ 1 aromatic ring.
contact_area_aa2float HIGHConnolly surface contact area between A and B at the dimer geometry — the touching surface where cohesion is mediated.
assembled_provenanceobject HIGHRecipe used (mf-factory-v0.91.1), xc_functional = wB97M-V, basis = def2-SVP (def2-TZVPP L3 fallback), dispersion = built-in VV10 NLC, counterpoise_scheme = Boys-Bernardi, dimer_relax_method = xtb_relax + DFT single-points, methodology_hash + 4-way MFFactory provenance stamp. Frozen at write time.
lineage_linkobject HIGHParent monomer .mfsig SHA-256 (parent_sha256) so the dimer is verifiably derived from a signed monomer. The lineage chain runs monomer → assembled, never the other direction.
Why three SCFs, not two. For a homodimer, the counterpoise scheme still runs both monomer SCFs in the dimer-basis ghost-atom setup (so A is in A′-basis ghosts, and A′ is in A-basis ghosts), even though the two are chemically identical. This makes BSSE-correction explicit and replicable — the same as the heterodimer recipe with the symmetry simplification recorded in assembled_provenance.counterpoise_scheme.
VII

.mfsig.hetero_dimer.jsondrug × polymer · mfsig/v0.91.1 · MFFactory

Drug × polymer-segment interaction for amorphous-solid-dispersion (ASD) miscibility prediction. Inherits both partners' monomer pillars, then adds hetero_interactionwith the asymmetric Boys-Bernardi counterpoise output plus the Flory-Huggins χ regression. Dual-Track storage: the raw unbounded χ is the canonical AI-internal field; a separate tanh-bounded χ_display is emitted ONLY for the frontend viewer.

Field (under hetero_interaction)TypeConf.What it means
drug_inchi_key_14string · 14-char HIGHInChI-key of the drug partner.
polymer_segment_smilesstring HIGHSMILES of the polymer monomeric segment used in the pair (e.g. PVPVA-vinyl-acetate, Soluplus-vinylcaprolactam-co-PEG, HPMCAS-cellulose-acetate-succinate). The segment is documented and matched against the polymer registry.
polymer_segment_mw_dafloat HIGHMolar mass of the segment monomer in Daltons — needed by Flory-Huggins χ to convert ΔE_interaction (per pair) to per-lattice-site χ.
dimer_geometry_aafloat[Na+Nb][3] HIGHCartesian of the relaxed AB heterodimer.
n_scfsint · always 5 HIGHAsymmetric Boys-Bernardi: (1) A clean, (2) B clean, (3) AB dimer, (4) A in dimer-basis with ghost-B, (5) B in dimer-basis with ghost-A. Five independent SCF cycles.
E_drug_clean_hartreefloat HIGHDFT energy of the drug at the dimer geometry, monomer basis only.
E_polymer_clean_hartreefloat HIGHDFT energy of the polymer segment at the dimer geometry, monomer basis only.
E_dimer_hartreefloat HIGHDFT energy of the drug + polymer-segment dimer at the relaxed geometry.
E_drug_in_dimer_basisfloat HIGHDFT energy of the drug in the full dimer basis (with ghost-B atoms). Used to subtract BSSE attributable to the basis the drug sees in the dimer context.
E_polymer_in_dimer_basisfloat HIGHMirror of the above for the polymer segment.
delta_e_interaction_kcal_molfloat HIGHΔE_interaction = E_dimer − E_drug_clean − E_polymer_clean, kcal/mol. The interaction observable. BSSE-corrected via the asymmetric counterpoise.
delta_e_bsse_kcal_molfloat HIGHBasis-set superposition error for forensic audit, separately reported (always ≥ 0).
flory_huggins_chifloat HIGHRaw, unbounded χ at 298 K. The CANONICAL AI-internal field — every downstream MolForge pipeline reads this value. Calibrated by closed-form ridge regression against the ASD miscibility cohort (90 hetero-dimères, 9 drugs × 10 polymères pharma, B3LYP+BSSE) — LODO ranking top-1 78%, top-3 89%, Spearman ρ +0.76 vs Hansen HSP ~30% top-1. NO Hansen HSP heuristics, NO empirical group contributions.
flory_huggins_chi_displayfloat HIGHtanh-bounded copy of χ clipped to [−3, +3] — emitted ONLY for the frontend Viewer's chip rows. Never trained on. Distinct field name so downstream consumers cannot confuse the two.
asd_miscibility_tagstring · enum MEDHuman-readable classification: good / moderate / poor / phase_separating. Derived from χ ranges with the boundary justifications recorded in the recipe registry.
contact_area_aa2float HIGHConnolly surface contact area between drug and polymer-segment at the dimer geometry.
hb_networkobject HIGHH-bond network across the interface: donor-on-drug × acceptor-on-polymer pairs, plus the inverse.
chi_calibration_provenanceobject HIGHASD miscibility cohort SHA (refs/asd_miscibility_kernel.json), anchor count (n=90 hetero-dimères, 9 drugs × 10 polymères), ridge λ, LODO ranking metrics (top-1 78%, top-3 89%, Spearman ρ +0.76), feature set (8-d RDKit drug fingerprint + 10-d polymer one-hot). Frozen at calibration time.
hetero_provenanceobject HIGHRecipe = mf-factory-v0.91.1, xc_functional = wB97M-V, basis = def2-SVP (def2-TZVPP L3 fallback), dispersion = built-in VV10 NLC, counterpoise_scheme = 'asymmetric_bb', dimer_relax_method = 'xtb_relax + DFT_single_point', methodology_hash + 4-way MFFactory provenance stamp.
lineage_linkobject HIGHParent drug .mfsig.json SHA-256 + parent polymer-segment .mfsig.json SHA-256 → the heterodimer is verifiably derived from two signed monomers.
Dual-Track storage
Raw χ is canonical. Display χ is presentational. The unbounded flory_huggins_chi is what every downstream MolForge AI pipeline trains on — we preserve the full quantum signal. The tanh-clipped flory_huggins_chi_display exists only so frontend chip rows show a value within the legacy pharma [−3, +3] band. The two are separately named so a consumer can never accidentally pipe display values back into training.
ASD miscibility kernel · n=90
No Hansen HSP, no empirical group contributions. χ is fit by closed-form ridge regression against the ASD hetero-dimer cohort — 90 dimères calculés en B3LYP/def2-SVP + counterpoise BSSE sur 9 drogues × 10 excipients pharma standards (eudragit, hpmcas, peg400, plga, pva, pvp, soluplus_*). LODO blind : top-1 hit 78%, top-3 hit 89%, Spearman ρ +0.76 (2.5× plus précis que Hansen HSP littérature ~30% top-1). Le kernel utilise 8 features RDKit (MolWt, LogP, MolMR, TPSA, NumHBD, NumHBA, NumRotBonds, NumAromRings) + one-hot 10 polymères. Anchor SHA-256, ranking metrics, feature set vivent dans chi_calibration_provenance sur chaque fichier. Source: refs/asd_miscibility_kernel.json.

How .mfsig compares

Every format that touches σ-profile or molecular property pipelines, side-by-side. ✓ first-class · ◐ partial / extractable · ✗ absent. The .mfsig column is what you take home.

Capability
.mfsig
this spec
.cosmo
Turbomole
CTD
COSMOtherm
OpenCOSMO-RS
feed
.fchk / .log
Gaussian
ORCA out
ORCA
NwChem out
NwChem
xtb out
xtb
.sdf
MDL · V2000
.mol
MDL · V2000
.xyz
atoms only
.pdb
RCSB
SMILES
string
InChI
string
coverage
25/25
6/25
+3◐
3/25
+4◐
3/25
+2◐
7/25
+2◐
6/25
+4◐
6/25
+4◐
6/25
+4◐
5/25
5/25
3/25
4/25
+2◐
4/25
4/25
+1◐
Identity
Connectivity (SMILES / structure)
Stable structure hash (InChI Key)
Atomic numbers + positions (Å)
Bond table
σ-content
σ-profile histogram
Per-segment σ-surface (position · normal · area · charge)
σ-moments (mean · var · skew · kurt)
HB donor / acceptor mass
Charge conservation gate (∫σ·dA ≈ 0)
Quantum
Converged SCF energy
g_polar (COSMO solvation free energy)
Bias-corrected ΔG (FreeSolv-anchored)
Direct DFT dipole archived
Atomic charges (Mulliken)
Forces ∂E/∂R at the reported geometry
Trust
SHA-256 audit seal
21 CFR Part 11 / ALCOA-compatible
UUID + ISO timestamp
Method + engine version pinned in file
Reproducibility metadata (lib lock)
AI
Embedding vector (ML feature store)
Declared downstream tasks (γ∞ · χ12 · ΔG)
Interop
Legacy vault — re-exports other vendor formats
Open spec · no vendor licence required
Single-file deliverable (no companion needed)
Legend first-class field partial · extractable with effort absent

Sources: published format specs for Turbomole .cosmo, COSMOtherm CTD, Gaussian .fchk/.log, ORCA, NwChem, xtb, ChemDraw / MDL SDF (V2000 / V3000), Protein Data Bank PDB, InChI / SMILES strings, OpenCOSMO-RS feed. Comparison is for the data persisted to a single deliverable file — not for what the originating tool can compute.

v0.92.0 SOTA migration · in flightrecipe added 2026-05-29

One change in v0.92.0: basis: def2-SVP → def2-TZVPP

wB97M-V was parameterised against def2-TZVPP (Mardirossian & Head-Gordon, 2016). v0.91.1 ran on def2-SVP as a pragmatic v28-era speed compromise. v0.92.0 flips the default to TZVPP — the basis the functional was calibrated for — and bumps the L3 fallback to def2-QZVPP. Everything else is unchanged: same wB97M-V, same C-PCM ε=80, same Bondi (P1) radii, same raw-σ moments, same 4-way provenance stamp. This is the only recipe knob that moves.

accuracy
TZVPP: ~1 kcal/mol energetics uncertainty (drug-like organics). SVP: ~3-5 kcal/mol. TZVPP is the recommended basis for wB97M-V production work.
cost
~3× more basis functions/atom → ~9× SCF time. Drug atlas (4529) goes from ~3h (SVP) to ~7h (TZVPP) on 5× H100.
kernel impact
All closed-form kernels (PC-SAFT, LogP, hydration, binding, χ) re-fit against v0.92.0 atlas after regen. v0.91.1 mfsigs stay readable for back-compat.
Recipe is in refs/MF_FACTORY_REGISTRY.json versions.0.92.0 (SHA cd88bda5…) · full pipeline diagram + audit commands at /factory.
v0.91.1 changelog · MFFactory generation

What changed from v2.5-multivariant to v0.91.1

NEW · one orchestrator
MFFactory.compute()

The five pre-v29 SCF pipelines (drug atlas, pocket fragments, dimer BSSE-CP, hetero pair, drug-as-pocket) collapse into one unified entry point in src/molforge/mf_factory.py. Same SCF fallback chain, same vendor guard, same provenance stamp on every artefact.

NEW · Klamt purge
pure area-weighted moments

Removed σ_HB threshold (0.0084 e/Ų), r_av spatial averaging (0.537 Å), hb_donor_mass / hb_acceptor_mass / polar_area_aa2 / halogen_mass empirical fields. The new sigma_moments block stores mean / variance / skewness / kurtosis of the raw σ_i = q_sym_i / area_i — the kernel discovers HB-like signal directly from the moments.

NEW · wB97M-V default
deep-gap pathology solved

Functional flipped from B3LYP-D3BJ → wB97M-V (range-separated + built-in VV10 NLC). On the v29 sealed-holdout: 10/10 oracle pairs converge at L0 with healthy gaps (+1.4 → +3.1 eV), where B3LYP had 3/10 with negative HOMO-LUMO gap. Corroborated by 96/96 drug SCFs L0-clean. 146/146 wB97M-V SCFs total.

NEW · 4-way provenance stamp

Every artefact now carries mf_factory_version · vendor_tree_sha256 · registry_recipe_sha256 · image_tag. Reading a file ten years from now reconstructs the exact compute context — Python source, vendored gpu4pyscf binary, recipe immutable, OS + CUDA + dependency layer — byte-for-byte.

KEPT · lineage chain

The append-only lineage from v2.4 is unchanged: parent_sha256 new_sha256 + history[]. Each entry now carries the producing mf_factory_version so a tier upgrade or recipe enrichment can be traced to the exact MFFactory call that produced it.

Backward readable, not backward producible — the verifier still opens any v2.4 / v2.5-multivariant file and replays its signatures, but new mfsigs only ship in v0.91.1. The Klamt-tainted fields are intentionally absent from the new schema; downstream consumers must read from cavity_and_sigma.sigma_moments (raw moments) instead of the legacy chemistry-aware split.
Example mfsig/v0.91.1 · MFFactory generation · trimmed for readability
{
  "mfsig_version": "0.91.1",
  "audit_and_trust": {
    "sha256_checksum":          "7e1f744c77d25dee78515d29ee505f55...",
    "uuid":                     "b61705e5-b992-4a7a-8a3e-7762155cc178",
    "timestamp_utc":            "2026-05-27T18:42:11Z",
    "molecule_name":            "BAXOFTOLAUCFNW",

    "mf_factory_version":       "0.91.1",
    "vendor_tree_sha256":       "9c1a4f88b0e7d3a2c5e6f1b9d8a7e4f3...",
    "registry_recipe_sha256":   "fd2d7890ded98d3af5b577f46c28fdcb...",
    "image_tag":                "worker:v29 @ sha256:2d45d796275ef25e840b373cba842a87...",
    "image_patch_chain": [
      "klamt-purge:no-sigma-hb-threshold",
      "klamt-purge:no-rav-spatial-averaging",
      "ghost-exclusion:P1-bondi-vdw",
      "pcm-cavity:L2bis-fix",
      "scf-default:wB97M-V/def2-SVP"
    ],

    "lineage": {
      "tier":           "Platinum",
      "generation":     1,
      "parent_sha256":  null,
      "genesis_sha256": "7e1f744c77d25dee78515d29ee505f55...",
      "history": [
        { "op": "genesis",
          "tier": "Platinum",
          "tool": "MFFactory.compute_drug::v0.91.1",
          "new_sha256": "7e1f744c..." }
      ]
    },
    "ownership": {
      "license_type":      "internal-rd",
      "organization":      "molforge.ai R&D",
      "signing_key_id":    "molforge-dev-2026",
      "molforge_signature": "boAdABa/LLHnDa7Rb05TddTi6qKIzacSTgDx..."
    },
    "recipe": {
      "name":             "mf-factory-v0.91.1",
      "functional":       "wB97M-V",
      "basis":            "def2-SVP",
      "dispersion":       "built-in VV10 NLC",
      "radii":            "Bondi (P1-patched)",
      "methodology_hash": "fd2d7890ded98d3af5b577f46c28fdcb...",
      "validation_at_compute": {
        "anchor_set":      "v29-sealed-holdout-2026-05-27",
        "MAE_dG_kcal_mol": 1.18,
        "n_molecules":     96
      }
    },
    "derived_grade": "MF3"
  },

  "chemistry_and_geometry": {
    "smiles":         "CC(C)N(C(C)C)C(=O)C...",
    "inchi_key_14":   "BAXOFTOLAUCFNW",
    "atomic_numbers": [6, 6, 6, 7, ...],
    "positions_aa":   [[...], ...],
    "n_heavy":        24,
    "n_total":        38
  },

  "quantum_and_thermodynamics": {
    "scf": {
      "converged":      true,
      "method":         "L0:DIIS",
      "fallback_level": 0,
      "cycles":         18,
      "wall_s":         42.7
    },
    "energies": {
      "total_hartree":    -1234.5678,
      "g_polar_hartree":     -0.0163,
      "homo_hartree":        -0.2412,
      "lumo_hartree":        -0.0118,
      "gap_ev":               6.24
    },
    "moments": {
      "dipole_debye":              [1.21, -0.84, 2.06, 2.55],
      "atomic_charges_mulliken":   [-0.31, 0.12, ...]
    },
    "cavity_and_sigma": {
      "cavity_area_aa2":   711.42,
      "n_segments":        4128,
      "sigma_per_segment": [...],
      "segment_areas":     [...],
      "sigma_moments": {
        "mean":      4.79e-05,
        "variance":  4.32e-05,
        "skewness": -0.214,
        "kurtosis":  3.067
      }
    }
  },

  "ai_intelligence_layer": {
    "embedding_vector":   [...],
    "feature_provenance": { "extractor": "molforge.kernel.v29-9d", "stamp": "..." }
  }
}
Note the absent fields vs v2.5: no physics_variants wrapper (single canonical SCF per file), no hb_donor_mass / hb_acceptor_mass / polar_area_aa2 / halogen_mass (Klamt-tainted, removed), no sigma_hb_threshold: 0.0084 (purged), no sigma_profile_101 binned histogram (raw per-segment σ is preserved instead).
Every field in a .mfsig is sealed by the SHA-256 in pillar I. Tamper with one byte and the audit signature breaks. The same file opens twenty years from now in the open spec — that's the contract. No vendor lock-in, no rebuild required.