When a dataset disappears, the public loses the ability to hold power to account. Durable archives are our answer: at-risk public data captured in open, persistent form that outlives its original source.
Durability is a discipline, not a backup. We follow the archival community’s reference model, OAIS (ISO 14721), and standard practices for keeping bits trustworthy over time:
- Fixity — checksums recorded at capture and re-verified, so silent corruption or tampering is detectable.
- Packaging with BagIt (RFC 8493), so files travel with their manifests.
- Persistent identifiers — DOIs via DataCite — so a citation still resolves years later.
- Open formats — non-proprietary, text-based formats (CSV, Parquet, JSON) so the data stays readable without licensed software.
- Redundancy across independent repositories, in the spirit of LOCKSS: lots of copies keep stuff safe.
We don’t work alone. We build on and contribute to the data-rescue ecosystem — the Internet Archive, the Environmental Data & Governance Initiative (EDGI), and DataLumos, ICPSR’s archive for government data.
Every archived dataset is paired with a documentation guide and, where possible, the reproducible pipeline that captured it. This is the public face of our At-Risk Federal Data Archive and data liberation toolkit.
Know of a dataset at risk? Tell the help desk.