Open-source curation · Python-first · in Spanish & English

The catalogue

Nº16 · Governance

Great Expectations

Quality tests for your data: define expectations and validate every load.

Library / frameworkIntermediateData Engineerpython

What is it?

Great Expectations (GX) is an open-source framework for data quality and validation in Python. You define "expectations" — verifiable rules like not null, values in range, uniqueness, or format — and validate every dataset or load against them, getting a clear result of what passed and what failed.

The core idea: treat data quality the way you treat code, with automated tests and readable reports instead of manual reviews or production surprises.

What is it for?

  • Automated tests in pipelines. You insert a validation between ingestion and consumption: if the data breaks the rules, the pipeline fails or alerts before the problem spreads.
  • Catch bad data early. Detect unexpected nulls, out-of-range values, or duplicates before they break a dashboard, a model, or a downstream report.
  • Data documentation. GX generates readable quality reports (Data Docs) that show which rules exist and how recent loads behaved.
  • Data contract. Expectations act as an explicit contract between whoever produces and whoever consumes a dataset.

When to use it / when not

Use it when data trust matters and pipelines are involved: recurring ingestions, third-party data, tables that feed decisions or models. Its rich expectation library and readable reports shine when quality is critical.

Think twice for:

  • Small cases inside dbt. If you already transform with dbt, its built-in tests (dbt test: not_null, unique, accepted_values) usually suffice and live alongside the model. GX pays off when you need richer rules, reports, or validation outside dbt.
  • Catalog and lineage. GX does not discover assets or map dependencies between tables — that's the job of OpenMetadata. GX answers "does this data meet the rules?", not "what data exists and where does it come from?".

Get started in 1 minute

Validate a pandas DataFrame against a couple of expectations using GX's modern API.

pip install great_expectations
import pandas as pd
import great_expectations as gx

df = pd.DataFrame({
    "country": ["PE", "CL", "AR"],
    "amount": [100, 80, 50],
})

batch = gx.get_context().data_sources \
    .add_pandas("sales").read_dataframe(df)

# Expectation 1: the 'country' column is never null
print(batch.validate(
    gx.expectations.ExpectColumnValuesToNotBeNull(column="country")
).success)  # True

# Expectation 2: 'amount' is between 0 and 1000
print(batch.validate(
    gx.expectations.ExpectColumnValuesToBeBetween(
        column="amount", min_value=0, max_value=1000)
).success)  # True

Change a value to None or outside the range and .success flips to False: that's GX catching the bad data before it moves on.

Quick trivia — test what you just read.

How much do you know about Great Expectations?

Official documentation

The source of truth lives there. Here we orient you; the depth is up to you.

Open official docs

What to learn next

See also

Nº16 · Updated 2026-06-26