Bauplan | Your data lakehouse, built like software

The data platform your team would build
…if they had the time

Built for speed and simplicity

Bauplan replaces Spark, Kubernetes, metadata catalogs, and custom platform glue with one cohesive system. Your team just writes Python and SQL. We handle execution, isolation, data versioning, and scale.

Experiment freely and roll back instantly with Git-for-Data

Use data branches to experiment, validate, and test your workloads in isolation. Every change is versioned, auditable, and reversible. No mistake is ever final.

Production ready with no migration

Bauplan runs production-grade workloads without the overhead of traditional platforms. Deploy with single-tenant, private link and BYOC options. Your data stays in object storage, no data movement needed.
SOC 2 Type 2 compliant with built-in isolation and access controls.

Bauplan infrastructure - Production ready with no migration

Built With Bauplan

Examples from the field. Real data applications built with Bauplan.

SEE ALL EXAMPLES

prefect

pandas

iceberg

Iceberg Lakehouse and WAP

Orchestrated Write-Audit-Publish pattern for ingesting parquet files to Iceberg tables.

Chris White

CTO @Prefect

RAG

Pinecone

OpenAI

RAG system with Pinecone and OpenAI

Build a RAG system with Pinecone and OpenAI over StackOverflow data.

Ciro

CEO @bauplan

PyArrow

Pandas

DuckDB

Data Quality and Expectations

Implement data quality checks using expectations.

Jacopo

CTO @bauplan

PDF

Open AI

Pandas

PDF analysis with OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis.

Patrick Chia

Founding Eng

duckdb

prefect

streamlit

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and visualize metrics with Streamlit.

Sam Jafari

Dir. Data and AI

dbt

CI/CD

marts

dbt-style Pipelines with CI/CD and Version Control

dbt workflows VS Bauplan pipelines with branching, testing, and CI/CD

Yuki Kakegawa

Staff Data Eng

A whole data platform in your code

Like Git for your data systems

Branch your data instantly

Create data branches in seconds. Power sandboxing, write-audit-publish workflows, and safe experimentation at scale.

Safe, declarative data pipelines

Test and run pipelines in isolated branches. Automate validations, merge with confidence, and roll back anytime.

LEARN MORE

import bauplan 
client = bauplan.Client() 
# Create a new branch
my_b = client.create_branch(
    branch='import_branch',
    from_ref='main'
)
    
# Create a new table in the branch
new_table = client.create_table(
    table='your_table_name',
    search_uri='s3://bucket/*.parquet',
    branch=my_b
)

Zero infrastructure, full control

import bauplan 

@bauplan.model()
# Define Python env with package versions in code
@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(data=bauplan.Model('my_data')):
    
  import pandas as pd
  df = data.to_pandas()
  df_cleaned = df.dropna()
  return df_cleaned

Pythonic and fully managed

Write modular functions in Python or SQL. Bauplan handles execution, scaling, and table I/O. No configs, containers, or runtime plumbing.

Run everything from your IDE

Declare your infrastructure directly in code. No Dockerfiles, no divergence between dev and prod. What you test is what you ship.

LEARN MORE

Code-first, designed for automation

A programmable data lakehouse

Write every step of your pipeline as code. Version everything — business logic, data, environments — just like software.

From commit to CI/CD, reproducible by default

Each run is tied to a commit. Validate before merging. Everything is deterministic, traceable, and rollback-ready.

Built for developers…and AI agents

Modular, versioned pipelines as code, isolated, reproducible, and infrastructure-free. Built for developers. Turns out, perfect for agents too.

import bauplan 

client = bauplan.Client() 

# Create a development branch 
_b = client.create_branch('my_b', from_ref='main') 

# Run the pipeline on it 
client.run('./my_project', ref=_b) 

# Inspect recent commits 
for commit in client.get_commits(ref=_b): 
     print(commit.message) 

# Merge changes into main 
client.merge_branch(_b, into_branch='main')

FAQs

What if I already have Databricks or Snowflake?

Great! Bauplan is built to be fully interoperable. All the tables produced with Bauplan are persisted as Iceberg tables in your S3, making them accessible to any engine and catalog that supports Iceberg. Our clients use Bauplan together with Databricks, Snowflake, Trino, AWS Athena and AWS Glue, Kafka, Sagemaker, etc.

What does Bauplan replace in my AWS data stack?

Bauplan simplifies your AWS setup by consolidating EMR, Spark, Kubernetes and Athena with simple serverless functions running on S3 branches. You continue using S3, and optionally Glue or Airflow, while the rest of your stack becomes simpler.

How do you keep my data secured?

Your data stays in your own S3 bucket at all times. Bauplan processes it securely using either Private Link (connecting your S3 to your dedicated single-tenant environment) or entirely within your own VPC using Bring Your Own Cloud (BYOC).

Do I need to learn a new data framework or Domain-Specific Language (DSL)?

No. Bauplan provides a lightweight Python framework, not a DSL. We want Bauplan to fit naturally into well-established engineering workflows: functions, modular code, tests, CI/CD. We believe that there are enough custom frameworks, DataFrame APIs and DSLs. The world does not need another one.

What does Git-for-Data mean?

Bauplan allows you to use git abstractions like branches, commits and merges to work with your data. You can create data branches for your data lake to isolate data changes safely and enable experimentation without affecting production, and use commits to time-travel to previous versions of your data, code and environments in one line of code. All this, while ensuring transactional consistency and integrity across branches, updates, merges, and queries. Learn more.

Your data lakehouse, built like software

The data platform your team would build
…if they had the time

Built for speed and simplicity

Experiment freely and roll back instantly with Git-for-Data

Production ready with no migration

Built With Bauplan

Iceberg Lakehouse and WAP

RAG system with Pinecone and OpenAI

Data Quality and Expectations

PDF analysis with OpenAI

Near Real-time Analytics

dbt-style Pipelines with CI/CD and Version Control

A whole data platform in your code

Like Git for your data systems

Branch your data instantly

Safe, declarative data pipelines

Zero infrastructure, full control

Pythonic and fully managed

Run everything from your IDE

Code-first, designed for automation

A programmable data lakehouse

From commit to CI/CD, reproducible by default

Built for developers…and AI agents

Latest from our blog

Bauplan’s MCP Server

Data engineer agents

Everything-as-Python

Git for Data: Formal Semantics of Branching, Merging, and Rollbacks (Part 1)

Python Over Data Lakes

Hello Bauplan

FAQs

Your data lakehouse, built like software

The data platform your team would build…if they had the time

Built for speed and simplicity

Experiment freely and roll back instantly with Git-for-Data

Production ready with no migration

Built With Bauplan

Iceberg Lakehouse and WAP

RAG system with Pinecone and OpenAI

Data Quality and Expectations

PDF analysis with OpenAI

Near Real-time Analytics

dbt-style Pipelines with CI/CD and Version Control

A whole data platform in your code

Like Git for your data systems

Branch your data instantly

Safe, declarative data pipelines

Zero infrastructure, full control

Pythonic and fully managed

Run everything from your IDE

Code-first, designed for automation

A programmable data lakehouse

From commit to CI/CD, reproducible by default

Built for developers…and AI agents

Latest from our blog

Bauplan’s MCP Server

Data engineer agents

Everything-as-Python

Git for Data: Formal Semantics of Branching, Merging, and Rollbacks (Part 1)

Python Over Data Lakes

Hello Bauplan

FAQs

The data platform your team would build
…if they had the time