Apache Airflow has long been one of our project’s trusted tools for orchestrating data pipelines. It efficiently handles scheduled ETL jobs and reporting tasks, especially when everything operates within a single cloud environment and follows a linear workflow.
As our application evolved with multi-cloud deployments (AWS and GCP), event-driven patterns, and long-running stateful processes (like model training), Airflow's traditional strengths began to show limitations. Specifically, managing persistent workflow state across failures, orchestrating event-based interactions across cloud boundaries, and the developer overhead for implementing the necessary resilience became significant pain points. These challenges prompted us to seek alternatives.
(more…)