The AI-Ready Pipeline: Reimagining Airflow at Veyer® Logistics with Anu Pabla
Innovation in orchestration is redefining how engineers approach both traditional ETL pipelines and emerging AI workloads. Understanding how to harness Airflow’s flexibility and observability is essential for teams navigating today’s evolving data landscape.In this episode, Anu Pabla, Principal Engineer at The ODP Corporation, joins us to discuss her journey from legacy orchestration patterns to AI-native pipelines and why she sees Airflow as the future of AI workload orchestration.Key Takeaways:(03:43) Engaging with external technology communities fosters innovation.(05:05) Mentoring early-career engineers builds confidence in a complex tech landscape.(07:51) Orchestration patterns continue to evolve with modern data needs.(08:41) Managing AI workflows requires structured and flexible orchestration.(10:35) High-quality, meaningful data remains foundational across use cases.(15:08) Community-driven open source tools offer lasting value.(16:59) Self-healing systems support both legacy and AI pipelines.(20:20) Orchestration platforms can drive future AI-native workloads.Resources Mentioned:Anu Pablahttps://www.linkedin.com/in/atomicap/The ODP Corporationhttps://www.linkedin.com/company/the-odp-corporation/The ODP Corporation | Websitehttps://www.theodpcorp.com/homepageApache Airflowhttps://airflow.apache.org/LlamaIndexhttps://www.llamaindex.ai/https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
Streamlining AI and ML Operations at IBM with BJ Adesoji and Ryan Yackel
In this episode, we’re joined by IBM’s Senior Product Manager, BJ Adesoji, and Chief Marketing Officer, Ryan Yackel. We discuss how IBM customers are using Airflow in production, the challenges they face at scale and what the new IBM–Astronomer collaboration unlocks.Key Takeaways:(03:09) The growing importance of orchestration tools in enterprise environments.(04:48) How organizations are expanding orchestration beyond traditional use cases.(05:24) Common patterns across industries adopting orchestration platforms.(07:16) Why orchestration is essential for supporting business-critical workloads.(10:00) The role of orchestration in compliance and regulatory processes.(13:02) Challenges enterprises face when managing orchestration infrastructure.(14:58) Opportunities to simplify and centralize orchestration at scale.(19:11) The value of integrating orchestration with broader data toolchains.(20:54) How AI is shaping the future of orchestrated data workflows.Resources Mentioned:BJ Adesojihttps://www.linkedin.com/in/bj-soji/Ryan Yackelhttps://www.linkedin.com/in/ryanyackel/IBM | LinkedInhttps://www.linkedin.com/company/databand-ai/IBM Databandhttps://www.ibm.com/products/databandIBM DataStagehttps://www.ibm.com/products/datastageIBM watsonx.governancehttps://www.ibm.com/products/watsonx-governanceIBM Knowledge Cataloghttps://www.ibm.com/products/knowledge-catalogApache Airflowhttps://airflow.apache.org/watsonx Orchestratehttps://www.ibm.com/products/watsonx-orchestrateDominohttps://domino.ai/Astronomerhttps://www.astronomer.io/Snowflakehttps://www.snowflake.com/en/dbt Labshttps://www.getdbt.com/Amazon SageMakerhttps://aws.amazon.com/sagemaker/Clouderahttps://www.cloudera.com/MongoDBhttps://www.mongodb.com/https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
Inside the Custom Framework for Managing Airflow Code at Wix with Gil Reich
Efficient orchestration and maintainability are crucial for data engineering at scale. Gil Reich, Data Developer for Data Science at Wix, shares how his team reduced code duplication, standardized pipelines, and improved Airflow task orchestration using a Python-based framework built within the data science team.In this episode, Gil explains how this internal framework simplifies DAG creation, improves documentation accuracy, and enables consistent task generation for machine learning pipelines. He also shares lessons from complex DAG optimization and maintaining testable code.Key Takeaways:(03:23) Code duplication creates long-term problems.(08:16) Frameworks bring order to complex pipelines.(09:41) Shared functions cut down repetitive code.(17:18) Auto-generated docs stay accurate by design.(22:40) On-demand DAGs support real-time workflows.(25:08) Task-level sensors improve run efficiency.(27:40) Combine local runs with automated tests.(30:09) Clean code helps teams scale faster.Resources Mentioned:Gil Reichhttps://www.linkedin.com/in/gilreich/Wix | LinkedInhttps://www.linkedin.com/company/wix-com/Wix | Websitehttps://www.wix.com/DS DAG Frameworkhttps://airflowsummit.org/slides/2024/92-refactoring-dags.pdfApache Airflowhttps://airflow.apache.org/https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
Modernizing Legacy Data Systems With Airflow at Procter & Gamble with Adonis Castillo Cordero
Legacy architecture and AI workloads pose unique challenges at scale, especially in a global enterprise with complex data systems. In this episode, we explore strategies to proactively monitor and optimize pipelines while minimizing downstream failures.Adonis Castillo Cordero, Senior Automation Manager at Procter & Gamble, joins us to share actionable best practices for dependency mapping, anomaly detection and architecture simplification using Apache Airflow.Key Takeaways:(03:13) Integrating legacy data systems into modern architecture.(05:51) Designing workflows for real-time data processing.(07:57) Mapping dependencies early to avoid pipeline failures.(09:02) Building automated monitoring into orchestration frameworks.(12:09) Detecting anomalies to prevent performance bottlenecks.(15:24) Monitoring data quality to catch silent failures.(17:02) Prioritizing responses based on impact severity.(18:55) Simplifying dashboards to highlight critical metrics.Resources Mentioned:Adonis Castillo Corderohttps://www.linkedin.com/in/adoniscc/Procter & Gamble | LinkedInhttps://www.linkedin.com/company/procter-and-gamble/Procter & Gamble | Websitehttp://www.pg.comApache Airflowhttps://airflow.apache.org/OpenLineagehttps://openlineage.io/Azure Monitorhttps://azure.microsoft.com/en-us/products/monitor/AWS Lookout for Metricshttps://aws.amazon.com/lookout-for-metrics/Monte Carlohttps://www.montecarlodata.com/Great Expectationshttps://greatexpectations.io/https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
Building an End-to-End Data Observability System at Netflix with Joseph Machado
Building reliable data pipelines starts with maintaining strong data quality standards and creating efficient systems for auditing, publishing and monitoring. In this episode, we explore the real-world patterns and best practices for ensuring data pipelines stay accurate, scalable and trustworthy.Joseph Machado, Senior Data Engineer at Netflix, joins us to share practical insights gleaned from supporting Netflix’s Ads business as well as over a decade of experience in the data engineering space. He discusses implementing audit publish patterns, building observability dashboards, defining in-band and separate data quality checks, and optimizing data validation across large-scale systems.Key Takeaways:.(03:14) Supporting data privacy and engineering efficiency within data systems.(10:41) Validating outputs with reconciliation checks to catch transformation issues.(16:06) Applying standardized patterns for auditing, validating and publishing data.(19:28) Capturing historical check results to monitor system health and improvements.(21:29) Treating data quality and availability as separate monitoring concerns.(26:26) Using containerization strategies to streamline pipeline executions.(29:47) Leveraging orchestration platforms for better visibility and retry capability.(31:59) Managing business pressure without sacrificing data quality practices.(35:46) Starting simple with quality checks and evolving toward more complex frameworks.Resources Mentioned:Joseph Machadohttps://www.linkedin.com/in/josephmachado1991/Netflix | LinkedInhttps://www.linkedin.com/company/netflix/Netflix | Websitehttps://www.netflix.com/browseStart Data Engineeringhttps://www.startdataengineering.com/Apache Airflowhttps://airflow.apache.org/dbt Labshttps://www.getdbt.com/Great Expectationshttps://greatexpectations.io/https://www.astronomer.io/events/roadshow/london/https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/https://www.astronomer.io/events/roadshow/san-francisco/https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning