Nº16 · Governance
Great Expectations
Quality tests for your data: define expectations and validate every load.
What is it?
Great Expectations (GX) is an open-source framework for data quality and validation in Python. You define "expectations" — verifiable rules like not null, values in range, uniqueness, or format — and validate every dataset or load against them, getting a clear result of what passed and what failed.
The core idea: treat data quality the way you treat code, with automated tests and readable reports instead of manual reviews or production surprises.
What is it for?
- Automated tests in pipelines. You insert a validation between ingestion and consumption: if the data breaks the rules, the pipeline fails or alerts before the problem spreads.
- Catch bad data early. Detect unexpected nulls, out-of-range values, or duplicates before they break a dashboard, a model, or a downstream report.
- Data documentation. GX generates readable quality reports (Data Docs) that show which rules exist and how recent loads behaved.
- Data contract. Expectations act as an explicit contract between whoever produces and whoever consumes a dataset.
When to use it / when not
Use it when data trust matters and pipelines are involved: recurring ingestions, third-party data, tables that feed decisions or models. Its rich expectation library and readable reports shine when quality is critical.
Think twice for:
- Small cases inside dbt. If you already transform with dbt, its built-in tests (
dbt test:not_null,unique,accepted_values) usually suffice and live alongside the model. GX pays off when you need richer rules, reports, or validation outside dbt. - Catalog and lineage. GX does not discover assets or map dependencies between tables — that's the job of OpenMetadata. GX answers "does this data meet the rules?", not "what data exists and where does it come from?".
Get started in 1 minute
Validate a pandas DataFrame against a couple of expectations using GX's modern API.
pip install great_expectations
import pandas as pd
import great_expectations as gx
df = pd.DataFrame({
"country": ["PE", "CL", "AR"],
"amount": [100, 80, 50],
})
batch = gx.get_context().data_sources \
.add_pandas("sales").read_dataframe(df)
# Expectation 1: the 'country' column is never null
print(batch.validate(
gx.expectations.ExpectColumnValuesToNotBeNull(column="country")
).success) # True
# Expectation 2: 'amount' is between 0 and 1000
print(batch.validate(
gx.expectations.ExpectColumnValuesToBeBetween(
column="amount", min_value=0, max_value=1000)
).success) # True
Change a value to
Noneor outside the range and.successflips toFalse: that's GX catching the bad data before it moves on.
Quick trivia — test what you just read.
How much do you know about Great Expectations?
Official documentation
The source of truth lives there. Here we orient you; the depth is up to you.
Open official docs ↗What to learn next
See alsoNº16 · Updated 2026-06-26