Nº21 · Governance

OpenMetadata

The open catalog to discover and trace the lineage of your data.

Platform—Intermediate—Data Engineer

What is it?

OpenMetadata is an open-source data catalog platform that centralizes discovery and lineage of all data assets in an organization. Think of it as the "map" of your data ecosystem: where each table lives, how data flows through the system, and who owns each asset.

What is it for?

Data discovery: indexes tables, dashboards, pipelines, and ML models from dozens of connectors (Snowflake, BigQuery, dbt, Airflow, Superset…) and surfaces them through semantic search with filters by owner, tag, or domain.
End-to-end lineage: automatically traces the chain source → transformation → consumption across pipelines, tables, and dashboards, making it straightforward to assess the impact of upstream changes.
Data quality: define and run quality tests directly on tables (uniqueness, nulls, value ranges) and assign owners per asset, so you know who to ask when something changes.

When to use it / when not

Use it when your organization runs multiple tools (several warehouses, orchestrators, BI platforms) and the data team loses time asking "where is that table?" or "who maintains it?". It is the natural fit if you already use dbt, Airflow, or Trino and want automatic lineage without manual instrumentation.

Think twice if your stack is small — a single database and a two-person team — because the deployment and maintenance overhead of OpenMetadata can outweigh the benefit. In that scenario, a well-maintained schema comment convention or DataHub Lite may be enough. If you only need dbt lineage, the built-in dbt docs site is considerably lighter.

Get started in 1 minute

The fastest path is spinning up the full stack with Docker:

git clone https://github.com/open-metadata/OpenMetadata
cd OpenMetadata/docker/development
docker compose up -d

Within a few minutes, the UI will be available at http://localhost:8585. Default credentials: admin / admin.

From the interface, navigate to Settings → Services → Add Service to connect your first data source. The connectors guide covers each integration in detail.

# Minimal example: read metadata via SDK
pip install openmetadata-ingestion

from metadata.ingestion.ometa.ometa_api import OpenMetadata
from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import (
    OpenMetadataConnection,
    AuthProvider,
)

server_config = OpenMetadataConnection(
    hostPort="http://localhost:8585/api",
    authProvider=AuthProvider.openmetadata,
    securityConfig={"jwtToken": "<your-jwt>"},
)
metadata = OpenMetadata(server_config)

# List all indexed tables
tables = metadata.list_entities(entity=Table)
for table in tables.entities:
    print(table.fullyQualifiedName.__root__)

The full API reference and Python SDK docs are at docs.open-metadata.org.

Quick trivia — test what you just read.