What we build

Durable archives

At-risk public datasets captured in open, persistent repositories that outlive their original sources.

When a dataset disappears, the public loses the ability to hold power to account. Durable archives are our answer: at-risk public data captured in open, persistent form that outlives its original source.

Durability is a discipline, not a backup. We follow the archival community’s reference model, OAIS (ISO 14721), and standard practices for keeping bits trustworthy over time:

  • Fixity — checksums recorded at capture and re-verified, so silent corruption or tampering is detectable.
  • Packaging with BagIt (RFC 8493), so files travel with their manifests.
  • Persistent identifiersDOIs via DataCite — so a citation still resolves years later.
  • Open formats — non-proprietary, text-based formats (CSV, Parquet, JSON) so the data stays readable without licensed software.
  • Redundancy across independent repositories, in the spirit of LOCKSS: lots of copies keep stuff safe.

We don’t work alone. We build on and contribute to the data-rescue ecosystem — the Internet Archive, the Environmental Data & Governance Initiative (EDGI), and DataLumos, ICPSR’s archive for government data.

Every archived dataset is paired with a documentation guide and, where possible, the reproducible pipeline that captured it. This is the public face of our At-Risk Federal Data Archive and data liberation toolkit.

Know of a dataset at risk? Tell the help desk.

← Back to Resources