What if I already have Databricks or Snowflake?

Great! Bauplan is built to be fully interoperable. All the tables produced with Bauplan are persisted as Iceberg tables in your S3, making them accessible to any engine and catalog that supports Iceberg. Our clients use Bauplan together with Databricks, Snowflake, Trino, AWS Athena and AWS Glue, Kafka, Sagemaker, etc.

What does Bauplan replace in my AWS data stack?

Bauplan consolidates pipeline execution and data versioning into one workflow: branch, run, validate, merge. You can keep S3 and your orchestrator; you remove a lot of cluster complexity and glue. For example, an Airflow DAG that spins up an EMR cluster, submits Spark steps, then runs an AWS Glue crawler to refresh the Glue Data Catalog before triggering downstream jobs becomes: Airflow triggers a Bauplan run on an isolated branch that writes Iceberg tables directly to S3.

How do you keep my data secure?

Your data stays in your own S3 bucket at all times. Bauplan processes it securely using either Private Link (connecting your S3 to your dedicated single-tenant environment) or entirely within your own VPC using Bring Your Own Cloud (BYOC).

Do I need to learn a new data framework or DSL?

No. Bauplan is just Python (and SQL for queries). That why your AI assistant can immediately write Bauplan code with no problem.

What does Git-for-Data mean?

Learn more at https://docs.bauplanlabs.com/en/latest/concepts/git_for_data/index.html

Run AI on your production data.
With full control.

Give your AI agents the isolation, transactional guarantees, and rollback they need to build, validate, and ship data pipelines on production data.

BOOK A DEMO

AI agents can iterate on code, but not on your data

Code is local and reversible. Data pipelines are not.
‍
Pipelines mutate shared state and failures leave production inconsistent. Traditional data platforms assume slow, manual change.

Bauplan is the execution layer built for fast, AI-generated iteration in production.

The missing execution layer that lets AI
work safely on production data

Work with data the same way you work with code

Everything in Bauplan is code, versioned in your repository and executed from your IDE. AI-generated changes run exactly as written, with no hidden state or manual steps.

Bring your AI coding assistant: we provide the safe execution layer.

QUICK START

Git-style safety for AI agents on production data.

Let AI agents work directly on production data without risk. Runs are isolated, publishes are atomic, and failed or bad changes can be rolled back immediately. Tests and expectations can gate publication before anything reaches your production tables.

LEARN MORE

Use cases

Automate data engineering workflows with simple agent skills,
from assessing the feasibility of a request to building and maintaining production pipelines at scale.

Building data pipelines

Agents and engineers build data pipelines like software: write transformations in code, run them in isolation against real data, and publish only validated results.

Learn more

Safe table ingestion

AI agents diagnose pipeline failures, replay runs against the exact state that produced them, and propose fixes in isolated branches you merge when ready.

Learn more

Debug & fix pipelines

Ingest new data into an isolated branch, validate it with quality checks, and publish atomically only when it passes.
‍

Learn more

Data exploration and discovery

Let agents run hundreds of profiling queries, inspect schemas, and sample rows across isolated branches to build a complete picture of your data.

Learn more

Integrations

Integrate with your storage, warehouses, and developer workflow.
Read more about Bauplan integrations in our docs.

Fivetran

Use Fivetran Managed Data Lake destination to write Apache Iceberg tables directly to your S3 bucket as a managed ELT workflow.

Big Query

Connect BigQuery to Bauplan by creating external Iceberg tables in BigQuery that point to your Bauplan tables and S3 storage.

Snowflake

Connect Snowflake to Bauplan to access your data as an Iceberg REST catalog, integrating seamlessly with your object store.

Marimo

A reactive Python notebook and app framework that allows developers to build and share interactive tools entirely in Python.

Temporal

A Python SDK that lets developers build resilient workflows and manage complex asynchronous tasks directly from their code.

Airflow

One of the most widely adopted orchestrators in data engineering, designed to programmatically author, schedule, and monitor workflows.

Estuary

Stream data from any Estuary source into Bauplan Iceberg tables using the Apache Iceberg materialization connector.

Orchestra

A managed orchestration platform that lets data teams build, schedule, and monitor pipelines efficiently within a unified interface.

Metabase

An open-source and enterprise-ready BI tool that enables teams to explore, visualize, and share data insights in real time.

Streamlit

A Python framework that turns scripts into interactive web apps for data exploration, dashboards, and internal tools.

Jupyter

An interactive environment for Python that supports reproducible research, data analysis, and collaborative experimentation.

Dagster

An orchestrator built for data applications, offering type-safe, observable, and testable pipelines for production workloads.

DBOS

A distributed operating system that provides high reliability, scalability, and consistency for workflow-driven applications.

Prefect

A modern workflow orchestration platform built in Python that simplifies task automation, monitoring, and error handling across teams.

See Less

A whole data platform in your repo

Branch, inspect and merge data like code

Bauplan models the state of your data as branches and commits. Create branches, run changes, inspect history, and merge only when tests are passed.

import bauplan

client = bauplan.Client()

dev_branch = client.create_branch(
    branch="fritz.dev",
    from_ref="main",
)

tables = client.get_tables(ref=dev_branch)
print(tables)

preview = client.query("SELECT * FROM my_table LIMIT 5", ref=dev_branch)
print(preview)

assert client.merge_branch(
    source_ref=dev_branch, 
    into_branch="main"
)

import bauplan
from bauplan.standard_expectations import expect_column_no_nulls

@bauplan.model()
@bauplan.python("3.11", pip={"pandas": "2.2.0"})
def clean_data(data=bauplan.Model("source_data")):
    df = data.to_pandas()
    return df.dropna()

@bauplan.expectation()
@bauplan.python("3.11")
def test_clean_data(data=bauplan.Model("clean_data")):
    return expect_column_no_nulls(data, "id")

Native Python execution. No infrastructure to manage.

Pipelines are ordinary Python and SQL functions. Declare environments and quality checks in code. Execution is managed by the platform.

One control loop for humans and agents

A few predictable primitives for developer and AI agents. Every workflow follows the same loop: branch → run → inspect → merge.

import bauplan

client = bauplan.Client()

dev_branch = client.create_branch(
    branch="fritz.dev",
    from_ref="main",
)

job_state = client.run("./my_project", ref=dev_branch)

if not job_state.success:
    raise Exception(f"{job_state.job_id} failed")

assert client.merge_branch(
    source_ref=dev_branch, 
    into_branch="main"
)

Latest from our blog

AI-first data engineering, Git-for-data semantics, and serverless execution over Iceberg

READ ALL POSTS

Introducing Bauplan Skills: Safe automation for AI on your data

Let AI agents ship data safely.

Ciro Greco

Duck Hunt: moving Bauplan from DuckDB to DataFusion

Arrow-Native and Community-Driven: Why DataFusion Won

Jacopo Tagliabue

We solved trust for AI Agents in 1973 (we just forgot)

What SQL databases already know about isolation, concurrency, and trust

Ciro Greco, Jacopo Tagliabue and Federico Bianchi (Together AI)

Bauplan’s MCP Server

The First Step Towards the Agentic Lakehouse

Ciro Greco

Hello Bauplan

Bauplan is a serverless data platform that treats pipelines, models, and tables like software.

Ciro Greco

Data engineer agents

From Prompt to Pipeline: Cloud-Native Agents for Data Transformation and ETL

Ciro Greco, Jacopo Tagliabue and Federico Bianchi

Run AI on your production data. With full control.

AI agents can iterate on code, but not on your data

The missing execution layer that lets AIwork safely on production data

Work with data the same way you work with code

Git-style safety for AI agents on production data.

Use cases

Building data pipelines

Safe table ingestion

Debug & fix pipelines

Data exploration and discovery

Integrations

Fivetran

Big Query

Snowflake

Marimo

Temporal

Airflow

Estuary

Orchestra

Metabase

Streamlit

Jupyter

Dagster

DBOS

Prefect

A whole data platform in your repo

Branch, inspect and merge data like code

Native Python execution. No infrastructure to manage.

One control loop for humans and agents

Latest from our blog

Introducing Bauplan Skills: Safe automation for AI on your data

Duck Hunt: moving Bauplan from DuckDB to DataFusion

We solved trust for AI Agents in 1973 (we just forgot)

Bauplan’s MCP Server

Hello Bauplan

Data engineer agents

Run AI on your production data.
With full control.

The missing execution layer that lets AI
work safely on production data