Reasearch

Dec 2, 2024

Written by Jacopo Tagliabue, Tyler Caraza-Harter and Ciro Greco

Bauplan: Zero-copy, Scale-up FaaS for Data Pipelines

Paper presented at WoSC10 2024. In collaboration with The University of Wisconsin.

Abstract

Chaining functions for longer workloads is a key use case for FaaS platforms in data applications. However, modern data pipelines differ significantly from typical serverless use cases (e.g., webhooks and microservices); this makes it difficult to retrofit existing frameworks due to structural constraints. In this paper, we describe these limitations in detail and introduce bauplan, a novel FaaS programming model and serverless runtime designed for data practitioners. bauplan enables users to declaratively define functional Directed Acyclic Graphs (DAGs) along with their runtime environments, which are then efficiently executed on cloud-based workers. We show that bauplan achieves both better performance and a superior developer experience for data workloads by making the trade-off of reducing generality in favor of data-awareness.


Read the full paper (WoSC10 '24: Proceedings of the 10th International Workshop on Serverless Computing - Pages 31 - 36)

Love Python and Go development, serverless runtimes, data lakes and Apache Iceberg, and superb DevEx? We do too! Subscribe to our newsletter.

Try bauplan

Try bauplan