The most-cited academic σ-profile reference set (LVPP) ships only the σ-profile — no geometries, no cavity construction parameters, no σ-moments, no reproducibility metadata. Anyone trying to verify or extend it has to re-derive everything from scratch, which is why downstream correlation against it is never higher than r² ≈ 0.20 in third-party measurements.
The MolForge Ground-Truth cohort solves this: 100 drug-like molecules with the complete data pack in .mfsig.json v2.0-apex. SHA-256 audit. Bit-reproducible via mol.recompute() on the same hardware. Anyone can verify or extend the cohort with one pip install + one pod run.
Cohort composition
10 small mols ≤ 8 atoms + 15 aromatic single-ring + 10 poly-ring + 25 drug scaffolds + 25 top-50 prescribed + 15 COSMO-RS solvent workhorses. < 50 heavy atoms each, charge = 0, RDKit-parseable.
Compute budget
The 100-drug Ground-Truth cohort is bit-reproducible, signed per drug, and ships with a Zenodo-style DOI on the Reference tier. The first σ-profile reference dataset in the literature where anyone with the same hardware can re-verify every entry.