Nº22 · Analysis
pandas
The Swiss Army knife for manipulating and analyzing tabular data in Python.
What is it?
pandas is the de facto library for working with tabular data in Python. Its core structure, the DataFrame, is like a programmable spreadsheet: rows, named columns and thousands of operations to clean, transform and summarize data.
What is it for?
- Reading and writing CSV, Excel, JSON, Parquet or SQL in one line.
- Cleaning real-world data: nulls, types, duplicates, dates, text.
- Grouping, pivoting and joining tables (
groupby,merge,pivot_table) to answer business questions.
When to use it / when not
Use it for almost any exploratory analysis or ETL that fits in memory: it's the standard, has the largest community and integrates with the whole ecosystem (NumPy, matplotlib, scikit-learn).
Think twice with datasets that don't fit in RAM or when speed matters: there Polars or DuckDB are usually faster and more memory-efficient.
Get started in 1 minute
pip install pandas
import pandas as pd
df = pd.read_csv("sales.csv")
# Total per country, sorted high to low
summary = (
df.groupby("country")["amount"]
.sum()
.sort_values(ascending=False)
)
print(summary.head())
Quick trivia — test what you just read.
How much do you know about pandas?
Official documentation
The source of truth lives there. Here we orient you; the depth is up to you.
Open official docs ↗What to learn next
See alsoNº22 · Updated 2026-06-08