Tooling & code

Everything the lab builds is open source — pipelines, notebooks, and small utilities you can fork, audit, and adapt. We would rather ship reusable tools than one-off scripts, so the next newsroom or clinic doesn’t start from scratch.

Find our code on GitHub, MIT-licensed. The flagship is the data liberation toolkit — an agent skill and project template that scaffold a full acquisition → cleaning → validation → documentation pipeline.

Our tooling is built on open, widely supported standards so it outlives any one contributor:

Structure follows Cookiecutter Data Science — a predictable layout for data, code, and outputs.
Pipelines are made reproducible with workflow tools like Snakemake, with data and model versioning via DVC and Git.
Environments are pinned (Docker, conda/uv) so a notebook that ran last year runs today.
Analysis lives in literate documents — Jupyter and Quarto — that interleave code, results, and explanation.

We favor small, composable utilities over monoliths, and we write tests so a fix in one project doesn’t silently break another. Each release notes its license, dependencies, and the data it expects.

We test before we recommend (see evaluated tooling) and document as we build (see data documentation guides), so a tool arrives with its caveats attached.

Found a bug, or want a pipeline adapted to your data? Open an issue on GitHub or reach the help desk.