Serverless data platform.
Strictly for developers.
Build AI and data applications with serverless Python functions and data branches. Turn weeks of infrastructure into a few lines of code.
Build AI and data applications with serverless Python functions and data branches. Turn weeks of infrastructure into a few lines of code.
Branch
Import
Run
Merge
import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.namespace:<12} {table.name:<12} {table.kind}")
Create sandboxes instantly without data duplication
Branch
Import
Run
Merge

import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.name:<12} {table.kind}")
Zero copy data lake branches
Branch
Import
Run
Merge

import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.name:<12} {table.kind}")
Zero copy data lake branches


See what you can build when infrastructure becomes Python code.




Branch your Data
Git for Data
Git for data
Data Version Control
Version control your data lake. Work with familiar operations—branch, commit, and merge—to track changes and collaborate confidently.
Instant Zero-Copy
Spin up development environments with your data in seconds. Create branches without duplicating data, saving both time and storage costs.
Safe and Sandboxed Experiments
Test and iterate freely with production data using safe, separate branches to move faster while keeping its integrity.
Launch development environments in seconds without data duplication, saving time and storage.
Use Git-like workflows for your data lake—branch, checkout, and merge seamlessly.
Keep your production environment safe. Collaborate in fully isolated, sandboxed environments.
import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.name:<30}")
import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.name:<30} {table.kind}")
import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.name:<30}")
import bauplan @bauplan.model() # Define Python env with package versions @bauplan.python(pip={'pandas': '2.2.0'}) def clean_data( # Input model reference data=bauplan.Model('my_data') ): import pandas as pd # Your data transformation logic here ... return clean_data
import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.name:<30} {table.kind}")
import bauplan @bauplan.model() # Define Python env with package versions @bauplan.python(pip={'pandas': '2.2.0'}) def clean_data( # Input model reference data=bauplan.Model('my_data') ): import pandas as pd # Your data transformation logic here ... return clean_data
Cloud development, simpler than local
Serverless Functions
Serverless Functions
Eliminate compute management. Run Python workloads seamlessly in the cloud with automatic scaling—no cluster configuration required.
Pure Python
Code in the language you already know. Build and test data applications directly in your IDE without learning specialized frameworks or DSLs.
No Infrastructure
Define environments with simple decorators and let Bauplan handle containers, dependencies, and resource management.
Run Python workloads in the cloud with automatic scaling—no cluster setup needed.
Build and test in your IDE without new frameworks or DSLs.
Focus on models—Bauplan handles containers, dependencies, and resources.
Develop with no Ops
Code-Driven Data Automation
Code-driven data automation
Robust deployment
Deploy with confidence. Merge validated changes into your data lake and integrate seamlessly with CI/CD pipelines.
Full reproducibility
Track and replicate pipelines deterministically. Know exactly what code produced which data, by whom, and when in one API call.
Effortless Integration
Connect your data ecosystem with one simple SDK. Integrate with visualization and orchestration tools—no special connectors needed.
Deploy with confidence—integrate validated data into production seamlessly.
Prevent issues early with embedded data quality checks.
Connect to tools and platforms with a single line of code.
Automate your Workflows
import bauplan client = bauplan.Client() # create a zero-copy branch of your data lake client.create_branch(dev_branch, from_ref='main') # create an Iceberg table and import data in it client.create_table(table_name, dev_branch) client.import_data(table_name, dev_branch) # run a pipeline end-to-end in a branch client.run('./my_project_dir', dev_branch) # merge the new tables into the main data lake client.merge_branch(dev_branch, into_branch='main') print('So Long, and Thanks for All the Fish')
import bauplan client = bauplan.Client() # Create a new branch branch = client.create_branch( branch="dev_branch", from_ref="main" ) print(f'Created branch "{branch.name}"') # List tables in the branch for table in client.get_tables(ref=branch): print(f"{table.name:<30} {table.kind}")
Python, Go, serverless, data lakes, Iceberg, and more than anything, superb DevEX.

Data as software and AI for the 99%
Data as software and AI for the 99%
Build simple, robust data apps with software engineering principles.
Ciro Greco

Optimizing Cloud OLAP with DuckDB & Iceberg
Optimizing Cloud OLAP with DuckDB & Iceberg
Building a Serverless Lakehouse: Lessons from Spare Parts
Nathan LeClaire

Recommender systems with MongoDB
Recommender systems with MongoDB
Full-stack recommender for training & MongoDB Atlas for real-time inference.
Ciro Greco

Zero-copy, Scale-up FaaS for Data Pipelines
Zero-copy, Scale-up FaaS for Data Pipelines
Paper presented at WoSC10 2024.
Jacopo Tagliabue, Tyler Caraza-Harter and Ciro Greco

Data as software and AI for the 99%
Build simple, robust data apps with software engineering principles.
Ciro Greco

Optimizing Cloud OLAP with DuckDB & Iceberg
Building a Serverless Lakehouse: Lessons from Spare Parts
Nathan LeClaire

Recommender systems with MongoDB
Full-stack recommender for training & MongoDB Atlas for real-time inference.
Ciro Greco

Zero-copy, Scale-up FaaS for Data Pipelines
Paper presented at WoSC10 2024.
Jacopo Tagliabue, Tyler Caraza-Harter and Ciro Greco

Containers Are Too Slow for Python
Running Cloud Data Workflows as Seamlessly as Loca
Nathan LeClaire and Ciro Greco




FAQ
Don't see an answer to your question? Check our docs.
How do you keep my data secure?
What if I have already Databricks or Snowflake?
What does Bauplan replace in my AWS data stack?
Do I need to learn an entirely new data framework?
What does Git for data mean?
How do you keep my data secure?
What if I have already Databricks or Snowflake?
What does Bauplan replace in my AWS data stack?
Do I need to learn an entirely new data framework?
What does Git for data mean?