Turn weeks of data infrastructure into a few lines of Python

Build robust AI and data applications over S3 with Serverless functions and Iceberg tables. No Kubernetes, no Spark, no infrastructure to manage.

Turn weeks of data infrastructure into a few lines of Python

Build robust AI and data applications over S3 with Serverless functions and Iceberg tables. No Kubernetes, no Spark, no infrastructure to manage.

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.namespace:<12} {table.name:<12} {table.kind}")

Create sandboxes instantly without data duplication

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<12} {table.kind}")

Zero copy data lake branches

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<12} {table.kind}")

Zero copy data lake branches

Build an Iceberg Lakehouse in pure Python

Build an Iceberg Lakehouse in pure Python

Implement a Data Lakehouse architecture over object storage with a Write-Audit-Publish pattern for safe data ingestion.

Implement a Data Lakehouse architecture over object storage with a Write-Audit-Publish pattern for safe data ingestion.

Implement a Data Lakehouse architecture over object storage with a Write-Audit-Publish pattern for safe data ingestion.

End-to-end Machine Learning Pipeline

End-to-end Machine Learning Pipeline

A streamlined pipeline transforming raw data into a predictions table: prepare a training set and train a Linear Regression model.

A streamlined pipeline transforming raw data into a predictions table: prepare a training set and train a Linear Regression model.

A streamlined pipeline transforming raw data into a predictions table: prepare a training set and train a Linear Regression model.

Data Augmentation with LLMs

Data Augmentation with LLMs

Entity matching in across different datasets, using OpenAI APIs. This example is on e-commerce product catalogs.

Entity matching in across different datasets, using OpenAI APIs. This example is on e-commerce product catalogs.

Entity matching in across different datasets, using OpenAI APIs. This example is on e-commerce product catalogs.

Read more

Built with Bauplan

Forecasting ML Pipeline

A streamlined pipeline transforming raw data into a predictions table: prepare a training set and train a Linear Regression model.

Data Augmentation with OpenAI

Entity matching across different e-commerce product catalogs, leveraging off-the-shelf LLM API from OpenAI. The entire project runs on object storage (S3) in open formats (Iceberg), relying solely on vanilla Python to orchestrate the DAG and integrate AI services.


Build a Lakehouse with Apache Iceberg in pure Python

Data Lakehouse architecture over object storage with Iceberg tables and robust Write-Audit-Publish workflows for safe data ingestion. Build a Lakehouse in ~150 lines of Python without needing a Data Warehouse, JVM, or Iceberg expertise.


Data quality and expectations

Define data quality constraints with expectations to enforce standards and monitor pipeline updates using blazing fast vectorized tests.

Interactive visualization with Streamlit

Build a data transformation pipeline and visualize results with Streamlit using SQL querying and branching.


Near Real-Time Analytics

Full stack real-time data pipeline for e-commerce analytics. This project features data ingestion, transformations, and live dashboards for key metrics like revenue and engagement, all managed with branch-based workflows and minimal setup.


The optimal workflow for data teams

Branch

Instant Zero-Copy

Quickly spin up development and testing environments without duplicating data.

Version Control for Data

Work with data the way you work with code. Use familiar operations like branching, checkout, and merging.

Safe and Sandboxed Experiments

Keep your production environment safe. Collaborate in fully isolated, sandboxed environments.

Instant Zero-Copy Environments

Quickly spin up development and testing environments without duplicating data.

Version Control for Data

Work with data the way you work with code. Use familiar operations like branching, checkout, and merging.

Safe and Sandboxed Experiments

Keep your production environment safe. Collaborate in fully isolated, sandboxed environments.

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.namespace:<12} {table.name:<30} {table.kind}")

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30} {table.kind}")

Develop

No Infrastructure to Manage

Define environments entirely in code — never worry about containers and environment management.

Pure Python

Build and test data applications directly in your IDE — no need to learn new frameworks, just code as you normally would.

Serverless Functions

Execute your workloads seamlessly in the cloud, combining serverless functions into pipelines.

import bauplan

@bauplan.model()
# Specify Python environment with exact package versions
@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference - points to existing table
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data

Automate

Merge your changes

Deploy by merging new tables into your main data lake. Use our Python SDK, automate your CI/CD pipelines and deployment.

Built-In testing

Incorporate unit tests and expectations directly into your workflows, ensuring your data is always reliable and consistent.

Effortless Integration

Connect to visualization tools and orchestrators with just one line of code.

import bauplan

client = bauplan.Client()

# create a zero-copy branch of your data lake
client.create_branch(dev_branch, from_ref='main')
# create an Iceberg table and import data in it
client.create_table(table_name, data_source, dev_branch)
client.import_data(table_name, data_source, dev_branch)
# run a pipeline end-to-end in a branch
client.run('./my_project_dir', dev_branch)
# merge the new tables into the main data lake
client.merge_branch(dev_branch, into_branch='main')

print('So Long, and Thanks for All the Fish')

import bauplan

@bauplan.model()
# Specify Python environment with exact package versions
@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference - points to existing table/model
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data

Develop

Serverless Functions

Execute your workloads seamlessly in the cloud, combining serverless functions into pipelines.

Pure Python

Build and test data applications directly in your IDE — no need to learn new frameworks, just code as you normally would.

No Infrastructure

Define environments entirely in code — never worry about containers and environment management.

Automate

Merge your changes

Deploy by merging new tables into your main data lake. Use our Python SDK, automate your CI/CD pipelines and deployment.

Built-In testing

Incorporate unit tests and expectations directly into your workflows, ensuring your data is always reliable and consistent.

Effortless Integration

Connect to visualization tools and orchestrators with just one line of code.

import bauplan

client = bauplan.Client()

# create a zero-copy branch of your data lake
client.create_branch(dev_branch, from_ref='main')
# create an Iceberg table and import data in it
client.create_table(table_name, data_source, dev_branch)
client.import_data(table_name, data_source, dev_branch)
# run a pipeline end-to-end in a branch
client.run('./my_project_dir', dev_branch)
# merge the new tables into the main data lake
client.merge_branch(dev_branch, into_branch='main')

print('So Long, and Thanks for All the Fish')

Branch

Instant Zero-Copy Environments

Quickly spin up development and testing environments without duplicating data.

Version Control for Data

Work with data the way you work with code. Use familiar operations like branching, checkout, and merging.

Safe and Sandboxed Experiments

Keep your production environment safe. Collaborate in fully isolated, sandboxed environments.

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.namespace:<12} {table.name:<30} {table.kind}")

import bauplan

@bauplan.model()
# Specify Python environment with exact package versions
@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference - points to existing table/model
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data

Develop

Serverless Functions

Execute your workloads seamlessly in the cloud, combining serverless functions into pipelines.

Pure Python

Build and test data applications directly in your IDE — no need to learn new frameworks, just code as you normally would.

No Infrastructure to Manage

Define environments entirely in code — never worry about containers and environment management.

Develop

No Infrastructure to Manage

Define environments entirely in code — never worry about containers and environment management.

Pure Python

Build and test data applications directly in your IDE — no need to learn new frameworks, just code as you normally would.

Serverless Functions

Execute your workloads seamlessly in the cloud, combining serverless functions into pipelines.

import bauplan

@bauplan.model()
# Specify Python environment with exact package versions
@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference - points to existing table/model
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data

Automate

Merge your changes

Deploy by merging new tables into your main data lake. Use our Python SDK, automate your CI/CD pipelines and deployment.

Built-In testing

Incorporate unit tests and expectations directly into your workflows, ensuring your data is always reliable and consistent.

Effortless Integration

Connect to visualization tools and orchestrators with just one line of code.

import bauplan

client = bauplan.Client()

# create a zero-copy branch of your data lake
client.create_branch(dev_branch, from_ref='main')
# create an Iceberg table and import data in it
client.create_table(table_name, data_source, dev_branch)
client.import_data(table_name, data_source, dev_branch)
# run a pipeline end-to-end in a branch
client.run('./my_project_dir', dev_branch)
# merge the new tables into the main data lake
client.merge_branch(dev_branch, into_branch='main')

print('So Long, and Thanks for All the Fish')

Try bauplan

Try bauplan

Try bauplan