Nº15 · Infrastructure
Git
The version control that underpins all reproducible data work.
What is it?
Git is a distributed version control system created by Linus Torvalds in 2005. It records the history of changes to a set of files —code above all— so you can see what changed, when, and why, roll back to any earlier point, and work in parallel without stepping on anyone.
The distributed part is key: every copy (clone) of the project is a full repository with the entire history. You work offline and sync with a remote (GitHub, GitLab) when you want to share.
What is it for?
- History and rollback. Each
commitis a snapshot of the project. If something breaks, you go back to the working version in seconds. - Branches for parallel work. A
branchlets you experiment or build a feature in isolation, thenmergeit when ready — without blocking the rest of the team. - Collaboration. On top of Git, platforms like GitHub/GitLab add pull/merge requests, code review, and issues. It is the team-work standard.
- The substrate of "everything as code". Pipelines (dbt, Airflow DAGs), infrastructure (Terraform), and CI/CD live in Git: versioned, reviewable, and reproducible.
When to use it / when not
Always use it when you write code or run a project: scripts, pipelines, notebooks, configuration, documentation. There is no reasonable alternative — it is a base skill, not optional.
Think twice —or use the right tool— in these cases:
- Large datasets or heavy binaries (multi-GB CSVs,
.pklmodels, images): Git becomes slow and bloated. Use Git LFS for binaries, or DVC to version data and models while keeping only a reference in Git. - Secrets (keys, tokens,
.env): never commit them to Git. Manage them separately and ignore them with.gitignore.
Get started in 1 minute
Create a repository, save your first commit, and look at the history:
git init # initialize the repo in the current folder
echo "# My project" > README.md
git add . # stage the changes
git commit -m "first commit" # save the snapshot to history
git log --oneline # view the history
To sync with a remote (GitHub/GitLab):
git remote add origin <repo-url>
git push -u origin main
Quick trivia — test what you just read.
How much do you know about Git?
Official documentation
The source of truth lives there. Here we orient you; the depth is up to you.
Open official docs ↗What to learn next
See alsoNº15 · Updated 2026-06-25