mfsig.com
A σ-profile primer

Glossary

Every term we use, defined plainly. Read it linearly for a tour of COSMO-RS, or jump in for the one term that tripped you up.

the basics

Core

.mfsig

The signed JSON file format we wrote σ-profile data into.

Five-pillar JSON schema (audit · chemistry · quantum · AI · legacy_vault) wrapped in a SHA-256 hash. Open spec, no licence needed. Every signed σ-profile we deliver is one of these files.

see every field

σ-profile (sigma-profile)

The histogram of charge density on a molecule's solvent-accessible surface.

If you put the molecule in an implicit solvent and ask 'how much surface area carries σ = −0.01 e/Ų?' the answer is a number for each bin. Plot it: you get a curve. That curve is the σ-profile, and it predicts solvation thermodynamics.

σ-surface segment

One tile on the molecule's COSMO cavity surface.

The cavity is tessellated into ~1 000–5 000 polygons; each tile has a position (where on the cavity), area (how big), σ (the local charge density), and is the unit of every σ-property calculation.

COSMO-RS

Conductor-like screening model — the σ-profile theory (Klamt 1995, 2005).

Andreas Klamt's framework: compute the σ-profile from a DFT calculation in an implicit conductor, then derive activity coefficients, vapour pressure, partition coefficients, etc. from interactions between σ-distributions. .mfsig is the open file format for COSMO-RS inputs.

what's inside the cavity

Physics

HB cutoff

|σ| ≥ 0.0084 e/Ų — the threshold above which a surface tile counts as a hydrogen-bond donor or acceptor.

Klamt's 2005 calibration. Below the cutoff: apolar surface. Above and negative: donor (an H wanting to be hugged by something electronegative). Above and positive: acceptor (a lone pair waiting for an H).

Charge conservation

∫σ·dA = 0 for a neutral molecule.

The integral of σ across the entire cavity surface must equal the negative of the solute's net charge (zero for neutrals) times ε-correction. Every v0.28.3 .mfsig satisfies this to machine ε. It's our cheapest, strongest gate.

ΔG-hydration

Free energy change to dissolve a molecule into water.

The number that tells you whether a drug is going to dissolve. We compute it from σ-profile + a bias correction calibrated against the Mobley FreeSolv set. LOO MAE 0.85 kcal/mol on 28 anchors (v0.28.3).

Dipole moment

The molecule's net electric polarity vector, in Debye.

A measure of how polar the molecule is overall. Water: 1.85 D. Atorvastatin: 8.78 D. We persist (μx, μy, μz, |μ|) from converged DFT density. The viewer can draw it as a 3D arrow.

D3BJ dispersion

Grimme's empirical correction for the van-der-Waals interaction that B3LYP misses.

Standard DFT (including B3LYP) under-counts London dispersion forces. D3BJ adds an empirical correction — small, but real, and important for accurate ΔG. Shipped in v0.28.2+.

Bondi radii

The atomic-radii table used to build the COSMO cavity.

Bondi (1964) — the most widely-used vdW radii for cavity construction. We tried Klamt-radii in v0.28.2 but they were calibrated against def2-TZVPD/σ_HB; reverting to Bondi restored the HB donor signal (r=+0.65 → +0.49 on 58 drugs in v0.28.3).

structure-level fields

Chemistry

SMILES

A line-string notation for a molecule.

CC(=O)Oc1ccccc1C(=O)O = aspirin. Compact, human-readable, RDKit-canonicalisable. We persist the canonical SMILES in every .mfsig as the human-readable identifier.

InChI key (14-char)

A 14-character structural hash you can use as a unique molecule key.

Cross-database identifier. Always-the-same-from-the-same-structure, derived deterministically. We persist the first 14 chars (which encode the skeleton — second block is stereochemistry / charge).

Aromatic ring

A planar ring with delocalised π electrons (benzene, pyridine, etc.).

v2.1 .mfsig persists aromatic ring membership as a first-class field. Bond order '4' means aromatic in our schema (matching SDF V2000 convention).

Stereocenter

An atom whose 3D arrangement of neighbours matters (R vs S).

Often a tetrahedral carbon with four different substituents. v2.1 .mfsig persists stereocenter atom indices + CIP configuration (R, S, or unspecified).

pKa / microspecies

Acid-dissociation constant; microspecies = one ionisation state.

Drugs ionise: at pH 7.4, aspirin is mostly anionic (its carboxylic acid lost the proton). Microspecies σ-profiles persist all the relevant protonation states with weights. Roadmap Q3 2026.

what the SHA-256 covers

Trust & audit

SHA-256 audit hash

The 64-hex-character integrity signature on every .mfsig.

Computed over a canonical subset (chemistry + methodology + results + solver args). Recompute it; mismatch means the file was modified. Same idea as Git commit hash but for σ-profile files.

connectivity_hash

A separate SHA covering just atoms + bonds + formal charges.

Lets an auditor verify the topology hasn't been touched without re-hashing the 600 KB σ-surface payload. Independent of the file SHA — both must match for the file to be considered intact.

21 CFR Part 11

FDA regulation for electronic records in pharma.

Requires audit trails, electronic signatures, and tamper-evident storage. .mfsig satisfies the technical requirements: attributable, legible, contemporaneous, original, accurate (ALCOA principles).

Replay bundle

Everything needed to re-run a compute from scratch — bit-identical.

Reference-tier .mfsig ships with engine version, library lock, git commit, hardware target, and the solver call args. On the same hardware + lib versions, recompute matches byte-for-byte.

grades · cohorts · atlas

Product

Pro / Platinum / Reference

Our three quality grades.

Pro = standard turnaround. Platinum = production-recommended (wB97M-V / def2-SVP + Bondi + C-PCM, the v0.91.1 grade). Reference = full regulatory audit trail — automated 21 CFR Part 11 ALCOA stamp + DOI-backed Zenodo deposit, no human in the loop. All three deliver the same .mfsig schema.

Cohort

A signed, versioned set of .mfsig files.

v0.28.3 cohort = 58 drugs + 26 polymers, 84/84 gates green. Atlas-v028_3 = ~4 078 drugs, in-flight. Every cohort gets versioned and (Q1 2027 onwards) a Zenodo DOI.

Atlas

The full ~4 078-molecule reference library we're computing.

Sorted small→big by heavy-atom count, runs continuously on dedicated compute. Each entry is a signed .mfsig delivered into the cohort. Public, citable when complete.

Want to go deeper?

  • · Anatomy of .mfsig — every field in the schema, typed and graded.
  • · Benchmarks — where we sit vs. PC-SAFT, COSMOtherm, GNNs, SMD/M06-2X (kernels site).
  • · Comparison matrix — .mfsig vs. every existing format + every vendor (kernels site).
  • · Roadmap — σ¹ today, σⁿ formulation by 2027.