Glossary
Every term we use, defined plainly. Read it linearly for a tour of COSMO-RS, or jump in for the one term that tripped you up.
Core
.mfsig
The signed JSON file format we wrote σ-profile data into.
Five-pillar JSON schema (audit · chemistry · quantum · AI · legacy_vault) wrapped in a SHA-256 hash. Open spec, no licence needed. Every signed σ-profile we deliver is one of these files.
→ see every fieldσ-profile (sigma-profile)
The histogram of charge density on a molecule's solvent-accessible surface.
If you put the molecule in an implicit solvent and ask 'how much surface area carries σ = −0.01 e/Ų?' the answer is a number for each bin. Plot it: you get a curve. That curve is the σ-profile, and it predicts solvation thermodynamics.
σ-surface segment
One tile on the molecule's COSMO cavity surface.
The cavity is tessellated into ~1 000–5 000 polygons; each tile has a position (where on the cavity), area (how big), σ (the local charge density), and is the unit of every σ-property calculation.
COSMO-RS
Conductor-like screening model — the σ-profile theory (Klamt 1995, 2005).
Andreas Klamt's framework: compute the σ-profile from a DFT calculation in an implicit conductor, then derive activity coefficients, vapour pressure, partition coefficients, etc. from interactions between σ-distributions. .mfsig is the open file format for COSMO-RS inputs.
Physics
HB cutoff
|σ| ≥ 0.0084 e/Ų — the threshold above which a surface tile counts as a hydrogen-bond donor or acceptor.
Klamt's 2005 calibration. Below the cutoff: apolar surface. Above and negative: donor (an H wanting to be hugged by something electronegative). Above and positive: acceptor (a lone pair waiting for an H).
Charge conservation
∫σ·dA = 0 for a neutral molecule.
The integral of σ across the entire cavity surface must equal the negative of the solute's net charge (zero for neutrals) times ε-correction. Every v0.28.3 .mfsig satisfies this to machine ε. It's our cheapest, strongest gate.
ΔG-hydration
Free energy change to dissolve a molecule into water.
The number that tells you whether a drug is going to dissolve. We compute it from σ-profile + a bias correction calibrated against the Mobley FreeSolv set. LOO MAE 0.85 kcal/mol on 28 anchors (v0.28.3).
Dipole moment
The molecule's net electric polarity vector, in Debye.
A measure of how polar the molecule is overall. Water: 1.85 D. Atorvastatin: 8.78 D. We persist (μx, μy, μz, |μ|) from converged DFT density. The viewer can draw it as a 3D arrow.
D3BJ dispersion
Grimme's empirical correction for the van-der-Waals interaction that B3LYP misses.
Standard DFT (including B3LYP) under-counts London dispersion forces. D3BJ adds an empirical correction — small, but real, and important for accurate ΔG. Shipped in v0.28.2+.
Bondi radii
The atomic-radii table used to build the COSMO cavity.
Bondi (1964) — the most widely-used vdW radii for cavity construction. We tried Klamt-radii in v0.28.2 but they were calibrated against def2-TZVPD/σ_HB; reverting to Bondi restored the HB donor signal (r=+0.65 → +0.49 on 58 drugs in v0.28.3).
Chemistry
SMILES
A line-string notation for a molecule.
CC(=O)Oc1ccccc1C(=O)O = aspirin. Compact, human-readable, RDKit-canonicalisable. We persist the canonical SMILES in every .mfsig as the human-readable identifier.
InChI key (14-char)
A 14-character structural hash you can use as a unique molecule key.
Cross-database identifier. Always-the-same-from-the-same-structure, derived deterministically. We persist the first 14 chars (which encode the skeleton — second block is stereochemistry / charge).
Aromatic ring
A planar ring with delocalised π electrons (benzene, pyridine, etc.).
v2.1 .mfsig persists aromatic ring membership as a first-class field. Bond order '4' means aromatic in our schema (matching SDF V2000 convention).
Stereocenter
An atom whose 3D arrangement of neighbours matters (R vs S).
Often a tetrahedral carbon with four different substituents. v2.1 .mfsig persists stereocenter atom indices + CIP configuration (R, S, or unspecified).
pKa / microspecies
Acid-dissociation constant; microspecies = one ionisation state.
Drugs ionise: at pH 7.4, aspirin is mostly anionic (its carboxylic acid lost the proton). Microspecies σ-profiles persist all the relevant protonation states with weights. Roadmap Q3 2026.
Trust & audit
SHA-256 audit hash
The 64-hex-character integrity signature on every .mfsig.
Computed over a canonical subset (chemistry + methodology + results + solver args). Recompute it; mismatch means the file was modified. Same idea as Git commit hash but for σ-profile files.
connectivity_hash
A separate SHA covering just atoms + bonds + formal charges.
Lets an auditor verify the topology hasn't been touched without re-hashing the 600 KB σ-surface payload. Independent of the file SHA — both must match for the file to be considered intact.
21 CFR Part 11
FDA regulation for electronic records in pharma.
Requires audit trails, electronic signatures, and tamper-evident storage. .mfsig satisfies the technical requirements: attributable, legible, contemporaneous, original, accurate (ALCOA principles).
Replay bundle
Everything needed to re-run a compute from scratch — bit-identical.
Reference-tier .mfsig ships with engine version, library lock, git commit, hardware target, and the solver call args. On the same hardware + lib versions, recompute matches byte-for-byte.
Product
Pro / Platinum / Reference
Our three quality grades.
Pro = standard turnaround. Platinum = production-recommended (wB97M-V / def2-SVP + Bondi + C-PCM, the v0.91.1 grade). Reference = full regulatory audit trail — automated 21 CFR Part 11 ALCOA stamp + DOI-backed Zenodo deposit, no human in the loop. All three deliver the same .mfsig schema.
Cohort
A signed, versioned set of .mfsig files.
v0.28.3 cohort = 58 drugs + 26 polymers, 84/84 gates green. Atlas-v028_3 = ~4 078 drugs, in-flight. Every cohort gets versioned and (Q1 2027 onwards) a Zenodo DOI.
Atlas
The full ~4 078-molecule reference library we're computing.
Sorted small→big by heavy-atom count, runs continuously on dedicated compute. Each entry is a signed .mfsig delivered into the cohort. Public, citable when complete.
Want to go deeper?
- · Anatomy of .mfsig — every field in the schema, typed and graded.
- · Benchmarks — where we sit vs. PC-SAFT, COSMOtherm, GNNs, SMD/M06-2X (kernels site).
- · Comparison matrix — .mfsig vs. every existing format + every vendor (kernels site).
- · Roadmap — σ¹ today, σⁿ formulation by 2027.