Build the data layer for AI systems

Code-native platform for versioned pipelines on object storage with zero infrastructure management. Simple for developers, robust for systems.

Code-native platform for versioned pipelines on object storage with zero infrastructure management. Simple for developers, robust for systems.

No credit card required.

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.namespace:<12} {table.name:<12} {table.kind}")

Create sandboxes instantly without data duplication

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<12} {table.kind}")

Zero copy data lake branches

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<12} {table.kind}")

Zero copy data lake branches

Everything you need to build reliable data applications. Nothing you don’t.

Less complexity, more robustness

Less complexity,
more robustness

The simplest way to build AI data apps, using only Python and familiar abstractions over S3.

Build AI data apps, using only Python and familiar abstractions over S3.

Branch your Data

Like Git for your data systems

Git for data

Reproducibility, by design

Know exactly what code produced what data, when, and why with Git-style commits. Everything is versioned, traceable, and auditable by default.

Instant branching for dev and prod

Instant data branching

Create isolated branches in seconds—zero copy, zero wait. Power experiments and Write-Audit-Publish workflows in production.

Safe, composable data operations

Run, test, and validate changes in complete isolation. Merge confidently, automate quality gates, and revert at any time.

Launch development environments in seconds without data duplication, saving time and storage.

Use Git-like workflows for your data lake—branch, checkout, and merge seamlessly.

Keep your production environment safe. Collaborate in fully isolated, sandboxed environments.

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30}")

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30}")

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30} {table.kind}")

import bauplan

@bauplan.model()
# Define Python env with package versions

@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data
import bauplan

@bauplan.model()
# Define Python env with package versions

@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data
import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30} {table.kind}")

Build applications not platforms

No infrastructure

Serverless Functions

Bauplan handles the tasks that usually require a platform team—packaging, scaling, execution, and environment isolation—so your developers can focus on writing Python.

Just serverless functions

Write modular functions in plain Python or SQL. Bauplan handles execution, scaling, and table I/O automatically—no config files, containers, or DAGs.

One interface, zero setup

Build and run everything from your IDE with a simple SDK. No Dockerfiles, no local hacks, no divergence between dev and prod.

Run Python workloads in the cloud with automatic scaling—no cluster setup needed.

Build and test in your IDE without new frameworks or DSLs.

Focus on models—Bauplan handles containers, dependencies, and resources.

Develop with no Ops

The code-first platform for the AI era

Code-driven data automation

A programmable data platform

Programmable data platform

Write modular, testable functions for every step of your data workflow. Version everything—tables, logic, environments—just like software.

Ship with confidence from commit to CI/CD

From commit to CI/CD

Validate before merge. Every run is tied to a commit, so results are deterministic, traceable, and rollback-ready.

Built for developers. Ready for agents

For developers and agents

Stop deploying notebooks with hidden states. Use typed APIs and versioned logic easy to script or automate—whether by a team or by a model.

Deploy with confidence—integrate validated data into production seamlessly.

Prevent issues early with embedded data quality checks.

Connect to tools and platforms with a single line of code.

Automate your Workflows

import bauplan

client = bauplan.Client()

# create a dev data branch
client.create_branch(dev_branch, from_ref='main')
# import data into tables
client.import_data(table_name, dev_branch)
# run a pipeline in dev
client.run('./my_project_dir', dev_branch)
# merge the new tables into the main data lake
client.merge_branch(dev_branch, into_branch='main')
import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30} {table.kind}")

FAQ

Don't see an answer to your question? Check our docs.

How do you keep my data secure?

What if I have already Databricks or Snowflake?

What does Bauplan replace in my AWS data stack?

Do I need to learn an entirely new data framework?

What does Git for Data mean?

How do you keep my data secure?

What if I have already Databricks or Snowflake?

What does Bauplan replace in my AWS data stack?

Do I need to learn an entirely new data framework?

What does Git for Data mean?

Try bauplan for free

Create your sandbox and start building.

Try bauplan for free

Create your sandbox and start building.