Rosetta Stone — universal file conversion

Hub-and-spoke architecture. Every reader normalises into the MolForge FullOutput Pydantic schema; every writer emits from it. Adding a new vendor is one new reader + one new writer.

Every QM vendor has its own .cosmo dialect. Different units, different column orders, different conventions for atom indexing. The historical workaround was per-vendor regex scripts; every lab maintained its own; none of them round-tripped to fp64.

The Rosetta Stone module collapses every vendor format into one hub: the MolForge FullOutput. Production support for Turbomole / COSMOtherm / xtb / MolForge CSV; partial for Gaussian / NwChem / ORCA. Auto-detect distinguishes them by content sniffing.

Format support matrix

Turbomole + COSMOtherm: production, fp64 round-trip. xtb native dump: production (disambiguated from Turbomole by the $gfn / # generated by xtb header). MolForge CSV: native interchange for ML pipelines. ORCA cpcm.dat, Gaussian SCRF, NwChem cosmo: partial (best-effort on the most common version layouts).

Adding a new vendor

Write _parse_X_blocks(path) → _ParsedLegacy with units normalised to Bohr / Å² / 0-based atom indices. Add an auto-detect branch to read_legacy_cosmo. Optionally add a write_X_cosmo writer. Total time per new vendor: ~1 day.