mfsig.com
MFFactory · v0.91.1 production · v0.92.0 TZVPP in validation

One factory. One recipe. Every mfsig reproducible byte-for-byte.

MFFactory is the unified DFT-first compute pipeline that produces every mfsig on the platform — monomers, dimers, hetero-pairs, pocket fragments. One recipe, one binary, one provenance stamp. Klamt-purged. ALCOA-grade audit trail. Production atlas is v0.91.1 def2-SVP ; v0.92.0 TZVPP migration is smoke-tested but not yet rolled out at scale.

wB97M-V / def2-SVP (v0.91.1)C-PCM ε=80 · Bondiraw σ · no Klamt · no r_av4-way provenance · SHA-locked
The pipeline

Five input types, one factory, five output schemas

Every cohort flows through the same MFFactory.compute() dispatcher. No cohort-specific recipe overrides, no parallel pipelines, no copy-paste forks. Same recipe → same provenance leg → same kernel-input contract downstream.

SMILES stringdrug atlas · 4,529 anchorsPDB pocket fragment{Z, positions_aa, charge, spin}drug × polymer pair{drug_atoms, polymer_atoms}protein × ligand dimer{frag_atoms, drug_atoms}self-assembled dimer{mono_a, mono_b}MFFactoryv0.92.0 · single source of truth1. wB97M-V / def2-TZVPP / DF2. C-PCM ε=80 · Bondi radii (P1)3. SCF fallback L0→L1→L2→L34. raw σ_i = q_sym_i / area_i5. area-weighted moments only6. 4-way provenance stamp7. SHA-256 seal · ALCOA-gradeno Klamt · no r_av · no empirical fit.mfsig.jsonmonomer · 1 SCF.mfsig.jsonpocket_fragment · 1 SCF.mfsig.hetero_dimer.json5-SCF BSSE-CP.mfsig.dimer.jsonprotein–ligand · 5-SCF BSSE-CP.mfsig.assembled.jsonhomodimer · 5-SCF BSSE-CP

The factory's only conditional logic is around object_kind + phase_label: monomers run 1 SCF, BSSE-CP dimers run 5 SCFs in v28-compliant Boys-Bernardi order. Everything else (functional, basis, solvent, grid, convergence) is read from the registry and is identical across cohorts.

The one recipe

MF_FACTORY_REGISTRY.json · versions.0.92.0

A single JSON file holds every knob the SCF pipeline reads. The file is SHA-256-hashed; that hash becomes the third leg of the four-way provenance stamp on every mfsig produced.

knobv0.92.0 valuewhy
functionalwB97M-Vrange-separated hybrid with built-in VV10 NLC dispersion · solves the deep-gap pathology that broke B3LYP on charged pockets
basisdef2-TZVPPthe basis wB97M-V was parameterised for (Mardirossian & Head-Gordon 2016) · ~1 kcal/mol energetics uncertainty on drug-like organics
aux_basisdef2-tzvp-jkfitdensity fitting for J + K · ~3× SCF speedup, sub-mEh accuracy preserved
solventC-PCMConductor-like Polarizable Continuum Model · induces the polarization σ-charges on the cavity surface
epsilon80.0water-like dielectric · matches the cohort calibration anchors (FreeSolv, ASD)
radiiBondi (P1-patched)van-der-Waals radii table · P1 patch excludes ghost atoms from cavity construction
grid(75, 302) Lebedevatom-grid radial × angular — production-quality DFT grid
conv_tol1e-06SCF energy convergence (Hartree) · 6-digit DFT energies
density_fittruealways on · TZVPP without DF would 10× the cost
L0 max_cycle60 (single) · 80 (BSSE)DIIS budget before falling back to L1
L3 fallback basisdef2-QZVPPONLY when L0/L1/L2 all fail · escape valve for pathological systems, NEVER silent

Source of truth: refs/MF_FACTORY_REGISTRY.json · v0.92.0 SHA-256 cd88bda5… · predecessor: 0.91.1 (def2-SVP).

SCF guardrails

Four-level fallback chain — NEVER silent

The default DIIS solver at the production basis handles 96/96 production drugs at L0. When it fails, the orchestrator records the fallback level in the mfsig so the reader can audit WHICH solver produced the number — no opaque retries, no quietly-different basis.

L0 · DIISdefault · 96/96 drugs converge heremax_cycle = 60L1 · DIIS + SOSCFNewton step · stiff systemsmax_cycle = 30L2 · LS 0.5 · coarselevel-shift + coarse grid · multi-refmax_cycle = 120L3 · DAMP + QZVPPlast-resort basis upgrade · NEVER silentmax_cycle = 100
what gets stored

Every mfsig records scf.fallback_level (0–3) and scf.method (DIIS / DIIS+SOSCF / DAMP+SOSCF). If level > 0 for a "normal" drug, that's a red flag worth investigating before downstream use.

what we don't do

No silent basis upgrade. No hidden retries with relaxed thresholds. If all four levels fail the SCF returns converged=false and the file is REJECTED at sealing time — never shipped to disk, never enters a kernel.

The Klamt-purge

Raw σ, no empirical thresholds

The legacy σ-profile workflow ran q_sym through a 0.537 Å r_av spatial average then binned with a 0.0084 e/Ų H-bond threshold — both are empirical fits to small-molecule training sets, both inflate the distribution variance by ~30× over the raw electrostatic signal. v0.91.1+ removed both. The new sigma_moments are pure area-weighted statistics on the raw per-segment σ.

σ = 0−0.0084+0.0084-0.03-0.0150+0.015+0.03legacy Klamt + r_av · variance ~ 10⁻⁴v0.91.1+ raw σ · variance ~ 3·10⁻⁶ (30× tighter)
sigma_moments formula (canonical)
σ_i  = q_sym_i / area_i                # raw, per-segment, no spatial average
w_i  = area_i / Σ area_j               # tile-area weight
mean      = Σ w_i · σ_i
variance  = Σ w_i · (σ_i − mean)²
skewness  = m₃ / variance^(3/2)
kurtosis  = m₄ / variance²  −  3       # excess

Four numbers per molecule. Deterministic. Re-derivable bit-exact from sigma_per_segment + segment_areas stored in every mfsig. Zero empirical parameters.

Provenance

Four-leg stamp on every mfsig

Every artefact carries four independent SHAs that, together, let a reader twenty years from now reconstruct the exact compute context byte-for-byte. Break any one, the file is rejected.

leg 1/4
mf_factory_version0.92.0
Python pipeline + SCF chain + moment formulas
leg 2/4
vendor_tree_sha256461e2aac…
patched gpu4pyscf binary
leg 3/4
registry_recipe_sha256cd88bda5…
frozen recipe in MF_FACTORY_REGISTRY.json
leg 4/4
image_tagworker:v29
OS + CUDA + dependency layer

Plus an Ed25519 signature over the canonicalised payload (legs 1–4 are inputs to the SCF; the signature authenticates that the file was actually sealed by molforge's signing key). Verification is offline — no network call required to confirm authenticity, only the public key at /.well-known/mfsig-keys.json.

Cohort gallery

Five object kinds, one schema family

MFFactory.compute() accepts five values for object_kind. Each cohort has its own orchestrator that calls compute() one or more times with the appropriate phase_label + ghost_atoms. The output schema is identical across cohorts — only the suffix changes.

cohort 1

drug atlas (monomer)

SMILES → RDKit ETKDG geometry → MFFactory single-phase SCF → σ-profile, dipole, HOMO–LUMO, Mulliken charges. The base unit of every downstream kernel.

n_anchors
4,529 (PDBBind LP v2020 + curated drug atlas)
SCFs / unit
1
object_kind
drug
output
.mfsig.json
scientific signature
compute_drug_mfsig(smiles, inchi_key_14, output_path) → MFFactory.compute(atoms, charge, spin, object_kind="drug", options={phase_label:"single"})
cohort 2

pocket fragment

Single-residue or MFCC-capped fragment from a PDB pocket → MFFactory single-phase SCF. Heavy-atom-only (descriptor H positions added by upstream capper); spin-0 closed-shell.

n_anchors
~1,000 (curated subset of PDBBind anchors)
SCFs / unit
1
object_kind
pocket_fragment
output
.mfsig.json (under pocket_fragments_v091_1/)
scientific signature
compute_pocket_fragment_mfsig(fragment_id, z_list, positions_aa, charge, spin, extras) → MFFactory.compute(atoms, ..., object_kind="pocket_fragment", options={phase_label:"single"})
cohort 3

hetero dimer (drug × polymer)

Drug × polymer-segment Boys-Bernardi 5-SCF BSSE-CP for ASD (amorphous solid dispersion) Flory-Huggins χ prediction. Atom order locked: [polymer_atoms..., drug_atoms...].

n_anchors
90 (Marsac 2006/2009 + extended cohort)
SCFs / unit
5 (dimer + 2 in-basis + 2 alone)
object_kind
hetero_phase
output
.mfsig.hetero_dimer.json
scientific signature
compute_hetero_bsse_cp(pair_id, drug_atoms, polymer_atoms, ...) → 5× MFFactory.compute(..., object_kind="hetero_phase", options={phase_label, ghost_atoms})
cohort 4

protein × ligand dimer (CL3)

Ligand × pocket fragment Boys-Bernardi 5-SCF BSSE-CP from the PDBBind CL3 benchmark cohort. The load-bearing input for the binding-affinity kernel.

n_anchors
97 (CL3 PDBBind v030.1 builder)
SCFs / unit
5 (dimer + 2 in-basis + 2 alone)
object_kind
dimer_phase
output
.mfsig.dimer.json
scientific signature
compute_dimer_bsse_cp(pair_id, frag_atoms, drug_atoms, ...) → 5× MFFactory.compute(..., object_kind="dimer_phase", options={phase_label, ghost_atoms})
cohort 5

self-assembled (homodimer)

Same-molecule pair — Boys-Bernardi 5-SCF BSSE-CP via the dimer orchestrator with frag = monomer_A, drug = monomer_B. Used for melting-point / cohesion-energy prediction. Phases 2–5 are arithmetically redundant but kept for provenance symmetry.

n_anchors
15 (assembled fixture set)
SCFs / unit
5 (reuses dimer_bsse_cp)
object_kind
dimer_phase
output
.mfsig.assembled.json
scientific signature
compute_dimer_bsse_cp(pair_id, frag_atoms=mono_a, drug_atoms=mono_b, ...) — identical 5-SCF chain, symmetry exploited by reader, not by writer.
The output

What's in every .mfsig.json

Five top-level blocks, the same structure on every cohort. The Anatomy page documents each field; this is the executive summary.

block 1/5
provenance

4-way SHA stamp + image_patch_chain + computed_at_utc + schema (mfsig/v0.91.1)

block 2/5
chemistry_and_geometry

elements / atomic_numbers / positions_aa / smiles / inchi_key_14 / total_charge / spin / ghost_mask

block 3/5
scf

converged / method / fallback_level / cycles / wall_s / n_basis · forensic SCF metadata

block 4/5
energies

total_hartree / g_polar_hartree / homo_hartree / lumo_hartree / gap_ev

block 5/5
moments + cavity_and_sigma

dipole_debye / atomic_charges_mulliken / cavity_area_aa2 / n_segments / sigma_per_segment (raw, ~3000–11000) / segment_areas / sigma_moments {mean, variance, skewness, kurtosis}

Infrastructure

How the factory actually runs

The pipeline runs on H100 SECURE pods orchestrated through a single REST-API spawner. One generic runner script per cohort, one shared spawn script, one SHA-pinned image.

image
worker:v29-prod-2026-05-27

vendor/gpu4pyscf (patched) + src/molforge (MFFactory + 5 orchestrators) + refs/MF_FACTORY_REGISTRY.json. SHA-256 pinned and stamped into every mfsig.

spawn
infra/v4/spawn_cohort_fleet.py

--cohort {drug|hetero|cl3|pocket|homodimer} --n 5 --gpu H100. Spawns N pods via Runpod REST API (NOT GraphQL — GraphQL podTerminate is silently broken), prints SCP + SSH + launch commands.

teardown
infra/v4/runpod_kill.py

REST DELETE /v1/pods/<id> — only reliable termination path. Verified 404 after each kill. Never leaves zombie pods burning $.

Audit trail

Verification you can run yourself

Every claim on this page is auditable from the bytes — no internal logs required.

# 1. Verify the recipe SHA matches what was used at compute time
sha256sum refs/MF_FACTORY_REGISTRY.json
# expect : cd88bda5… (v0.92.0)

# 2. Verify every mfsig in a cohort matches the registry recipe
python refs/_v091_regen_audit.py
# expect : ALL GREEN — AST + orchestrator alignment + parse + extract/ignore

# 3. Verify a single mfsig's 5-tier quality gates
python refs/_audit_v091_1_n100_quality.py
# tier 1: schema integrity   (5/5 expected)
# tier 2: physics sanity     (5/5)
# tier 3: klamt-purge enforcement
# tier 4: cross-file consistency (uniform recipe SHA)
# tier 5: sigma_moments recomputation (machine-epsilon)

# 4. Verify a downstream prediction comes from the right mfsig
jq '.provenance.{mf_factory_version,vendor_tree_sha256,registry_recipe_sha256,image_tag}' \
   refs/atlas_v031_v091_1/<inchi_key_14>.mfsig.json
go deeper

Open a real mfsig in the 3D viewer

See provenance, SCF diagnostics, σ-profile, and Mulliken charges live.