mfsig.com
σ-profile generation

Roadmap

Where the .mfsig generation engine, the open format, the viewer, and the standards are heading. Public roadmap, honest status flags, dependencies declared. We ship what we promise. (Downstream prediction kernels are on the separate kernels site.)

SHIPPED
9
milestones
IN FLIGHT
3
milestones
NEXT
5
milestones
PLANNED
23
milestones
R&D
4
milestones
track

Physics fidelity

14 milestones
v0.28.1 — v0.28.2·Apr 2026
SHIPPED
Charge conservation + D3BJ + Klamt radii

Three numerics fixes: charge-conservation renorm on the kept segment set, NaN-safe σ-moments, segment-area floor. Plus Grimme D3BJ dispersion. Klamt radii experiment.

unlocksFirst honest SOTA-position rank
v0.28.3·May 2026
SHIPPED
D3BJ + Bondi reversion · 84/84 PASS

Reverted Klamt → Bondi after the calibration mismatch dropped HB signal r from +0.65 → +0.09. Cohort: 58 drugs + 26 polymers · 84/84 PASS all 7 gates. Charge conservation at machine ε on every drug.

unlocksProduction-grade σ-profile bench, FreeSolv LOO MAE 0.85
v0.28.3 atlas (LIVE)·May 2026 →
IN FLIGHT
Atlas of 4 078 drugs

Autonomous overnight queue, sorted small→big by heavy-atom count. Skip-if-exists + restart-safe. Every entry follows the v0.28.3 gates. Expanding the calibration anchor set for tighter ΔG corrections.

unlocksReference cohort for cross-vendor benchmarks
depends onD3BJ + Bondi reversion · 84/84 PASS
drug atlas v25·May 2026 →
IN FLIGHT
Thousands of drugs × polymers × solvents

Atlas generation at v0.29-directqm-svp recipe. Covers drug-like chemistry beyond the FreeSolv-overlapping validation cohort. Powers /atlas page screening workflows: candidate ranking, solvent screen, patent leads, portfolio risk, CDMO QC. Coverage-driven (not anchor-aligned) — /benchmarks stays on the v028 canonical cohort.

unlocksPre-computed pair scores for ASD formulation + solvent selection
depends onv0.29 (recipes) · physics_variants block · chem-aware moments
v0.28.4·Jun 2026
NEXT
def2-TZVPD basis · dipole bias closure

Address the documented +0.62 D systematic dipole bias at B3LYP/def2-SVP. TZVPD adds diffuse functions on aromatic π-systems where the bias is worst (aniline, nitrobenzene, imidazole, caffeine).

unlocksDipole MAE projected <0.25 D · publication-grade
v0.28.5·Jun 2026
NEXT
Open-shell pipeline (UKS) · close the failure mode

Today ~5% of the atlas fails with "electron number X and spin 0 inconsistent" — radicals, ammonium cations, organometallics. v0.28.5 adds an automatic UHF/UKS branch when RDKit detects odd-electron / charged species, recovering bretylium, choline, pyridostigmine and the rest.

unlocksFull coverage of pharma-relevant chemistry
v0.29 — Heavy-track tier·Jul – Aug 2026
PLANNED
Heavy-Track pipeline · 123+ atoms

HPMCAS_L (123 atoms) currently overshoots the single-GPU VRAM ceiling at the JK-contraction stage. Heavy-Track introduces staged DFT with chunked builders + next-generation GPU fallback, lifting the size cap to ~200 atoms.

unlocksBig-polymer ASD excipients · large peptides
depends ondef2-TZVPD basis · dipole bias closure
ensemble v1·Jun – Jul 2026
NEXT
Conformer ensemble · Boltzmann-weighted σ

ETKDG embed → UFF preopt → xtb rank → DFT top-k → Boltzmann avg over canonical ensemble. Each conformer is its own audit-signed .mfsig; the ensemble carries an additional ensemble hash binding them.

unlocksFlexible-scaffold drug-likeness without manual confsearch
depends onD3BJ + Bondi reversion · 84/84 PASS
pH-state σ·Jul – Aug 2026
PLANNED
Microspecies σ-profiles

For ionisable drugs (≈45% of pharma library), persist a per-pH microspecies fan: neutral, anion, cation, zwitterion. Each is a full .mfsig keyed by pKa source. The user requests "drug at pH 7.4" and gets the weighted blend.

unlocksRealistic biological-pH ADMET inputs
depends onensemble v1
v0.31-gold-svp (sidecar) — quality tier·May 2026 · in flight
IN FLIGHT
wB97M-V / def2-SVP + DF-K — V3 Atlas pilot

Range-separated meta-GGA wB97M-V with native VV10 dispersion + density fitting on both J and K (def2-svp-jkfit) + C-PCM ε=80, gpu4pyscf 1.7.0. Sidecar recipe; the SEAL recipes (v0.29-directqm-svp, v0.28.3) remain canonical. Currently sweeping an isolated 5,950-mol shadow atlas (4,533 universal cohort + 74 CompSol + 1,343 Tm polymer fusion); deep audit at n=1,013 reports 1,013 / 1,013 pristine (zero NaN / null / dim-mismatch / v0.29 contamination). Kernel refit + True-OOD re-verification gated behind cohort completion. NOT CERTIFIED until then. ⚠ Slug overlap with the planned v0.31 — relativistic physics entry below; final numbering to be locked before public release.

unlocksV3 quality direction validation before broad rollout (Q3 hyper-speed sprint depends on this gate)
depends onmfsig v2.5-multivariant · Q1c DF-K lock-in
v0.30 — multi-reference·Q4 2026
PLANNED
NEVPT2 fallback for difficult π-systems

Some drugs (porphyrins, polycyclic aromatics with low-lying excitations) are poorly handled by single-reference B3LYP. v0.30 adds an automatic NEVPT2 / CASSCF fallback triggered by the diagnostic D1 / T1 indices.

unlocksTrustable σ on the 5% hardest molecules
depends onv0.29 — Heavy-track tier
v0.31 — relativistic·Q1 2027
PLANNED
Scalar-relativistic σ for heavy elements

Iodine (e.g. iodofenphos, levothyroxine), bromine (vermurafenib), and beyond. X2C scalar-relativistic Hamiltonian + appropriate basis sets. Drops the dipole / polarisability error on Z ≥ 35 molecules to <5%. ⚠ Slug overlap with the in-flight v0.31-gold-svp sidecar above; renumber one of the two before public release.

unlocksHalogenated and metal-containing drugs
depends onv0.30 — multi-reference
v0.32 — anharmonic ΔG·Q2 2027
PLANNED
Anharmonic vibrational ΔG corrections

Harmonic ZPE + entropy is the standard approximation; for floppy molecules it under-counts entropy by ~1–2 kcal/mol. Vibrational SCF or quasi-harmonic correction tightens ΔG MAE another 30%.

unlocksSub-0.5 kcal/mol ΔG predictions
depends onv0.31 — relativistic
v0.30 (D4 + ωB97M-V)·Q3 2026 W7-W8
PLANNED
SOTA dispersion + native VV10 functional · ships as physics_variants

Three new recipes: v0.30-d4 (B3LYP + D4 instead of D3BJ, ~5 % cost premium), v0.30-wb97mv (ωB97M-V + native VV10, ~12 % cost premium), v0.30-wb97mv-tz (MF4 reference, def2-TZVPD). v0.29 stays default; v0.30 ships alongside in physics_variants for customers who need the absolute SOTA claim.

unlocksSOTA-physics tier for patent / regulatory submissions
track

Format & trust

13 milestones
v0.22 — v0.27·Q4 2025 → Q1 2026
SHIPPED
Five-pillar .mfsig schema

Audit + chemistry + quantum + AI + legacy_vault. SHA-256 audit on every file. Rosetta Stone IO for 7 vendor formats. The contract that everything else hangs off.

unlocksTamper-evident single-file deliverable
mfsig v2.1-apex·May 2026
SHIPPED
Bonds + stereo + rings + connectivity hash

Every .mfsig now persists the full molecular graph: RDKit-derived bond table, stereocenters with CIP labels, formal charges, ring systems, a separate connectivity-hash sub-SHA. No other format combines this with σ-profile.

unlocksStereo-aware viewer · cross-checkable topology
depends onFive-pillar .mfsig schema
v0.29 (audit tiers)·May 2026
SHIPPED
Three audit tiers

Standard production tier + Compatibility tier (drop-in for legacy COSMO-RS pipelines) + Premium tier (pure-physics, zero empirical calibration on the benchmark). Each carries its own methodology hash and validation context. Same audit grade across all three.

unlocksCustomers pick the audit context that matches their downstream — same signed .mfsig schema
mfsig v2.5-multivariant·May 2026
SHIPPED
physics_variants block · chem-aware moments

v2.5 introduces physics_variants under quantum_and_thermodynamics — one molecule can carry results for multiple recipes in the same file, each independently signed. Chem-aware σ-moments (hb_donor_mass / hb_acceptor_mass / pi_negative_mass / polar_area_aa2 / halogen_mass) use bond-table classification (segment_chemistry v2.2). Backward compatible with v2.4.

unlocksSwitch downstream pipelines without re-computing the foundation
depends onBonds + stereo + rings + connectivity hash
v2.4 lineage (git-for-σ)·Apr 2026
SHIPPED
SHA-chained derivation history

audit_and_trust.lineage block: parent_sha256, genesis_sha256, history[]. Every derivation event (tier_upgrade, add_variant, add_pair_score) appends a signed entry. Replaces the old fallback semantics. Enables incremental upgrades without rebuild.

unlocksPro → Platinum → Reference upgrades with audit chain intact
depends onFive-pillar .mfsig schema
mfsig v2.2·Q3 2026
PLANNED
Conformer-ensemble schema

Native multi-conformer .mfsig with per-conformer SHAs + an ensemble-level hash binding them. Replay the ensemble; verify a single member; auditor accepts both.

unlocksEnsemble physics under the same audit umbrella
depends onensemble v1 · Bonds + stereo + rings + connectivity hash
mfsig v2.3·Q4 2026
PLANNED
Mixture & pH-state schemas

Schema extensions for binary/ternary/n-ary mixtures and microspecies fans. Each file is a graph of signed sub-files (each microspecies / each conformer / each partner) with a top-level ensemble Merkle root.

unlocksSingle-file audit for an entire formulation
depends onmfsig v2.2 · σ² · binary
Parquet v1·Q1 2027
PLANNED
.mfsig.parquet for 10⁸-scale libraries

Columnar Parquet flavour of .mfsig with σ-profile fixed-width vectors + per-row SHA. Vectorised scan over hundred-million-molecule libraries on a laptop. Same audit chain, different storage layout.

unlocksPharma-scale property libraries on commodity HW
depends onmfsig v2.3
RFC 3161 timestamps·Q3 2026
PLANNED
Notary-anchored timestamps

Every .mfsig signed with an RFC 3161 timestamp from an accredited TSA (DigiCert, GlobalSign, free.tsa.cz). Auditor can prove the file existed at a specific second — not just "the writer claims this date". Cryptographic complement to the existing SHA-256.

unlocksCourt-admissible timestamps for IP filings
depends onBonds + stereo + rings + connectivity hash
federated co-signing·Q4 2026
PLANNED
Multi-lab co-signed Reference tier

A Reference-grade .mfsig can carry N independent SHA-256 signatures from N labs that each recomputed and verified the cohort. Same file, multiple signers. Replaces "vendor trust" with "distributed trust" for the highest tier.

unlocksCross-vendor reference cohorts auditors accept
differential .mfsig·Q1 2027
PLANNED
Audit-chained incremental updates

Recomputed σ for an existing molecule? Don't re-emit the whole file — emit a diff record signed by the new compute, parented to the previous SHA. Auditor walks the chain back to the original. Same idea as Git; we built it for chemistry.

unlocksVersioned σ history without storage bloat
depends onmfsig v2.3
IPFS content addressing·Q2 2027
R&D
.mfsig over content-addressed storage

CID = ipfs://Qm... derived from the file SHA. Anyone holding the CID re-fetches the bit-identical file from any peer who keeps it. Distributed-by-default reference cohorts; no central registry hostage situation.

unlocksVendor-independent permanence
post-quantum signing·2028
R&D
PQ-secure audit signatures

Layer a NIST-PQC signature (e.g. Dilithium) on top of the SHA-256 once practical PQC hardware lands. SHA-256 is collision-resistant against classical attackers; this protects against future quantum ones. Forward-compatible spec.

unlocks30-year archival horizon
track

Tools & UX

11 milestones
viewer SOTA pack·May 2026
SHIPPED
3D viewer · dipole · σ-filter · share · snapshot

Babylon.js viewer with split-CPK bonds, aromatic hairlines, dipole vector arrow, live σ-tile threshold slider, high-res PNG snapshot, shareable URL state, in-canvas legend. The reference web viewer for σ-profile data.

unlocksFree public 3D inspection of every shipped .mfsig
viewer · pharmacophore·Q3 2026
PLANNED
Pharmacophore overlay + measurement tool

Click two atoms = distance. Three = angle. Four = dihedral. Plus auto-pharmacophore markers (HBD/HBA/aromatic/lipophilic) derived directly from σ-surface — most viewers fake these from heuristics; ours come from the actual cavity.

unlocksOn-canvas SAR analysis
viewer · session export·Q4 2026
PLANNED
PyMOL/ChimeraX session export

One click → a .pse or .cxs file that opens in PyMOL/ChimeraX with the same atoms, σ-surface mesh, dipole vector, and saved camera. Pharma teams keep their existing pipeline; we feed it richer data.

unlocksAdopt without replacing existing tools
WebXR mode·2027
R&D
Walk around the cavity in VR

Babylon → WebXR. Stand at the centroid of atorvastatin; see donor / acceptor patches as glowing islands around you. Hand-controller σ-threshold slider. Demo + teaching tool, not a primary workflow.

unlocksOutreach + chemistry education
Jupyter widget·Q3 2026
NEXT
%mfsig display in notebooks

anywidget-based Jupyter cell magic: `mol = read_mfsig('x.mfsig.json'); mol` renders the full 3D viewer inline. Drop into pharma data-science workflows without leaving the notebook.

unlocksNotebook-native σ workflows
VSCode extension·Q4 2026
PLANNED
.mfsig preview in VSCode

Click any .mfsig.json file in the explorer; the 3D viewer opens in a custom-editor tab. Tooltip-on-hover shows the audit pack. Same shipping standard as image previewers — no setup required.

unlocksEngineering-team adoption with zero friction
Excel add-in·Q1 2027
PLANNED
MFSIG() formula in Excel

=MFSIG("aspirin", "dipole") returns 2.55 D. Functions for every key field. Office.js sideload that pharma analysts install in 30 seconds. Drudgery-collapse for spreadsheet-bound formulators.

unlocksPharma formulators meet σ on their own turf
GraphQL atlas API·Q1 – Q2 2027
PLANNED
Atlas query API

GraphQL endpoint over the public atlas: filter by dipole, by HB donor mass, by sigma-similarity to a SMILES, by atom count. Streaming + pagination. Same audit chain via signed response headers.

unlocksProgrammatic access to the reference cohort
depends onv0.28.3 atlas (LIVE)
iPad touch viewer·Q2 2027
PLANNED
First-class touch + Apple-Pencil mode

Pinch-zoom, two-finger orbit, pencil-tap to label atoms, palette button to summon the σ-threshold slider. The boardroom-and-bench demo device, not just a thumbnail-sized scaling of desktop UX.

unlocksTablet-first pharma reviews
real-time co-viewer·Q3 2027
R&D
Multi-cursor σ review

Two chemists open the same viewer URL; each sees the other's cursor, camera state, and slider position in real time. Yjs CRDT over WebRTC. "Slack thread for the molecule."

unlocksLive remote chemistry review
/status transparency dashboard·May 2026
SHIPPED
Auto-updating Pure B3LYP inventory page

Server-rendered /status page driven by refs/generate_status_inventory.py: live cohort counts, Zero-Corruption audit summary, 3-gate verification verdict, True-OOD validation results. Customers verify the integrity of our data lake without account or API call.

unlocksPublic-facing audit of the foundation
depends onformat SEALED · 2026-05-18
track

Ecosystem & standards

6 milestones
spec public RFC·Q3 2026
NEXT
.mfsig 2.x spec at github.com/molforge/mfsig-spec

Pull the schema out of the code into a versioned, RFC-style document with worked examples + JSON Schema files. Other implementers (academic groups, vendor reimplementations) can target the spec rather than reverse-engineering our writer.

unlocksMultiple independent implementations
OpenChem PR·Q4 2026
PLANNED
Upstream .mfsig support in OpenChem / RDKit / OpenBabel

Pull requests adding native .mfsig read / write to the major open chemistry toolkits. We carry the maintenance burden for the first two release cycles. Lowers adoption friction for every research group already on those stacks.

unlocksZero-install support across the OS chem ecosystem
depends onspec public RFC
IUPAC liaison·2027
PLANNED
IUPAC working-group submission

Engage with the IUPAC Computational Chemistry sub-committee on σ-profile interoperability. Goal: .mfsig listed as an IUPAC-recognised exchange format alongside InChI / SMILES / .sdf for σ-profile data specifically.

unlocksStandards-body legitimacy
depends onOpenChem PR
FAIR-compliant DOI·Q1 2027
PLANNED
Zenodo DOI per cohort release

Every cohort release (drugs, polymers, atlas) gets a citable Zenodo DOI with the FAIR-data metadata. Academic users cite a versioned dataset, not a moving target. ResearchGate-grade transparency.

unlocksAcademic adoption + citation tracking
validation consortium·2027
PLANNED
Cross-lab validation cohort

Partner with three independent labs to re-compute the same 50-drug subset on different hardware + different DFT codes (PySCF, Q-Chem, ORCA). Publish the cross-lab variance. Empirically establishes the .mfsig audit guarantee.

unlocksMulti-lab reproducibility benchmark
depends onfederated co-signing
open-CLI v1.0·Q4 2026
PLANNED
Open-source `mfsig` CLI · 1.0 release

Apache-2.0 CLI on PyPI + Homebrew + apt. Read/write/verify/convert. No backend dependency — runs entirely locally. The on-ramp that lets anyone produce + audit .mfsig files without a paid account.

unlocksPure-OSS pathway for academic users

How to read this roadmap

  • ● SHIPPED — verifiable today: in the public spec, the published cohort, or this codebase.
  • ● IN FLIGHT — the work runs right now on our compute; deliverable name + version is committed.
  • ● NEXT — scoped, sized, dated. We start when the predecessors clear.
  • ● PLANNED — architectural sketch exists; dependencies declared above; quarter-precision dates.
  • ● R&D — speculative; year-precision; we publish iff it works, otherwise we say so.

Dependencies are explicit. A milestone can't slip its "depends on" predecessors silently — and when one does slip, it propagates downstream and the roadmap reflects it on the next refresh. Where we land vs. the original date is something we publish openly in the changelog.