Nº22 · Analysis

pandas

The Swiss Army knife for manipulating and analyzing tabular data in Python.

Library / framework—Intro—Data Scientist·Data Engineer—python

What is it?

pandas is the de facto library for working with tabular data in Python. Its core structure, the DataFrame, is like a programmable spreadsheet: rows, named columns and thousands of operations to clean, transform and summarize data.

What is it for?

Reading and writing CSV, Excel, JSON, Parquet or SQL in one line.
Cleaning real-world data: nulls, types, duplicates, dates, text.
Grouping, pivoting and joining tables (groupby, merge, pivot_table) to answer business questions.

When to use it / when not

Use it for almost any exploratory analysis or ETL that fits in memory: it's the standard, has the largest community and integrates with the whole ecosystem (NumPy, matplotlib, scikit-learn).

Think twice with datasets that don't fit in RAM or when speed matters: there Polars or DuckDB are usually faster and more memory-efficient.

Get started in 1 minute

pip install pandas

import pandas as pd

df = pd.read_csv("sales.csv")

# Total per country, sorted high to low
summary = (
    df.groupby("country")["amount"]
      .sum()
      .sort_values(ascending=False)
)

print(summary.head())

Quick trivia — test what you just read.

How much do you know about pandas?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs ↗

What to learn next

Polars

DataFrames in Rust: fast, parallel and with lazy evaluation.

Intropython

Nº14Analysis

DuckDB

The analytical database that runs inside your process — no server.

Introsql

Nº22 · Updated 2026-06-08