Two things are failing at once. The public datasets that let citizens see what their government is doing are being removed, quietly altered, and left to rot — and the local newsrooms, nonprofits, and watchdogs who once turned that data into accountability are disappearing along with them. The University of Colorado Public Interest Data Science Laboratory exists to hold the line on both.
What’s disappearing
Open public data is not permanent. It is hosted, budgeted, and politically contingent — and a growing share of it has gone offline, been edited without notice, or been locked behind paywalls and logins. At the same time, the institutions that scrutinized that data have been gutted: Colorado has lost roughly half of its newspaper journalists since 2005, and the reporters who remain have less time and fewer resources than ever to fight for records.
USGS streamflow records ██████ AT RISK
NOAA climate projections ██████ AT RISK
When a dataset vanishes, it rarely comes back. When a newsroom closes, the institutional memory of how to find and read that data closes with it. Preserving both — the data, and the capacity to use it — is the whole point of the lab.
Our mission
CUPIDS is a laboratory for teaching data science in the public interest, a clinic connecting Colorado’s watchdogs to technical capacity, and a network preserving evidence for democratic accountability.
In practice that is three kinds of work, running at once:
- We rescue data. We acquire, clean, document, and archive at-risk public datasets in durable, open repositories before they disappear — and recover the ones that already have.
- We build capacity. Through a pro-bono help desk, we give Colorado’s journalists, lawyers, and nonprofits the technical help they can’t otherwise afford: pulling data out of documents, recovering deleted records, merging messy sources, fact-checking analyses, and making findings legible.
- We train people. Cross-disciplinary student teams do this work for real stakes, building a pipeline from undergraduate engagement through graduate study to professional practice — and shipping original data journalism along the way.
Why this work
The lab grew out of a specific, lived problem. Its director spent eighteen years studying how people collaborate online — on Wikipedia, Reddit, and Twitter — research that depended entirely on open data about social behavior. Over the last decade, platforms enclosed that information commons: APIs were shut off and public data was locked away to be sold and to train AI models. The open foundation a generation of researchers stood on was pulled out from under them.
That same enclosure is now reaching public data — the environmental monitoring, health statistics, labor records, and scientific datasets that democratic oversight runs on. But the methods built to study the information commons can be redirected to defend it. That is the bet CUPIDS makes: that public-interest data science — preservation, infrastructure, and technical capacity, taught to students and given away to watchdogs — is one of the few durable answers to data that disappears.
This is civic infrastructure, not neutral tooling, and we are unapologetic that the work has a purpose: keeping the evidence that lets the public hold power to account.
What we value
- Public interest first. We work for Colorado’s journalists, lawyers, nonprofits, and citizens — not for clients who can pay, and not for platforms or institutions whose interests run the other way.
- Independence. We are a named, accountable lab, and we keep our most sensitive infrastructure deliberately separate from any single institution — including our own university — so the people who trust us aren’t exposed by it.
- Openness and auditability. Our methods, our code, and this website are open source, so our claims can be checked rather than taken on faith.
- Source protection. Protecting the people who bring us data is a design requirement, not a footnote — we minimize what we collect, keep sensitive channels off institutional systems, and don’t unmask sources.
- Responsible AI. We treat models as instruments that need calibration: we measure error, document limits, keep humans in the loop, and put the caveats up front.
- Durability. We build for the long term — open repositories, reproducible pipelines, and documentation — so the work outlives any single grant, tool, or news cycle.
From COLUMN to CUPIDS
CUPIDS did not start from nothing. Its direct predecessor is COLUMN — the Colorado Laboratory for Users, Media, and Networks, founded by Brian Keegan in CU Boulder’s Department of Information Science. COLUMN’s work was computational social science: using the digital traces left on Wikipedia, Reddit, and Twitter to understand how people collaborate, how attention moves, and how online communities form and fracture around current events, controversies, and disasters.
That research ran on open data — and as platforms enclosed their data to sell it and to train AI models, the ground shifted under it. CUPIDS is the response to that enclosure: the same computational, human-centered, methodologically interdisciplinary approach, redirected from studying the information commons to preserving the public data that the commons now depends on.
The lab is also a lineage of people. COLUMN’s researchers and alumni — many of whom have gone on to research, data science, and journalism careers of their own — include doctoral researchers Matt Nicholson, Samantha Dalal, Laurie Jones, Alex Newhouse, Ben Emery, and Leo Orozco; master’s researchers Elijah Boykoff, Jack Stein, Jordan Kesner, and Natalie Castro; and alumni C. Estelle Smith, Jordan Wirfs-Brock, Katy Weathington, Nathan Beard, Emily Porter, Andrew Schwartz, Irfanul Alam, Xiaozhe “Arcadia” Zhang, Cade Wilson, Tamer Shahwan, and William Egesdal. The full roster lives at columnlab.github.io.
CUPIDS carries that work forward into a moment that needs it more.
Where you come in
The lab only works because people show up for it — to bring a data problem, to do the work, or to fund it. That’s the whole model, and there’s a way in for every skill set.
Defend the data democracy depends on.
Request help from the desk, join a student team, partner with us, or fund the work — start on the Get Involved page.