The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

https://feeds.cohostpodcasting.com/yiLVF7xu
19 Followers 74 Episodes Claim Ownership
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: ht...
View more

Episode List

From ETL to Airflow: Transforming Data Engineering at Deloitte Digital with Raviteja Tholupunoori

Apr 10th, 2025 4:00 AM

Data orchestration at scale presents unique challenges, especially when aiming for flexibility and efficiency across cloud environments. Choosing the right tools and frameworks can make all the difference. In this episode, Raviteja Tholupunoori, Senior Engineer at Deloitte Digital, joins us to explore how Airflow enhances orchestration, scalability and cost efficiency in enterprise data workflows.Key Takeaways:(01:45) Early challenges in data orchestration before implementing Airflow.(02:42) Comparing Airflow with ETL tools like Talend and why flexibility matters.(04:24) The role of Airflow in enabling cloud-agnostic data processing.(05:45) Key lessons from managing dynamic DAGs at scale.(13:15) How hybrid executors improve performance and efficiency.(14:13) Best practices for testing and monitoring workflows with Airflow.(15:13) The importance of mocking mechanisms when testing DAGs.(17:57) How Prometheus, Grafana and Loki support Airflow monitoring.(22:03) Cost considerations when running Airflow on self-managed infrastructure.(23:14) Airflow’s latest features, including hybrid executors and dark mode.Resources Mentioned:Raviteja Tholupunoorihttps://www.linkedin.com/in/raviteja0096/?originalSubdomain=inDeloitte Digitalhttps://www.linkedin.com/company/deloitte-digital/Apache Airflowhttps://airflow.apache.org/Grafanahttps://grafana.com/solutions/apache-airflow/monitor/Astronomer Presents: Exploring Apache Airflow® 3 Roadshowshttps://www.astronomer.io/events/roadshow/https://www.astronomer.io/events/roadshow/london/https://www.astronomer.io/events/roadshow/new-york/https://www.astronomer.io/events/roadshow/sydney/https://www.astronomer.io/events/roadshow/san-francisco/https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

A Deep Dive Into the 2025 State of Airflow Survey Results with Tamara Fingerlin of Astronomer

Apr 3rd, 2025 5:24 AM

The 2025 State of Airflow report sheds light on how global users are adopting, evolving and innovating with Apache Airflow. With over 5,000 responses from 116 countries, the survey reveals critical insights into Airflows’ role in business operations, new use cases and what’s ahead for the community.In this episode, Tamara Fingerlin, Developer Advocate at Astronomer, walks us through her process of analyzing survey data, key trends from the report and what to expect from Airflow 3.0.Key Takeaways:(02:14) The State of Airflow report combines anonymized telemetry and survey results.(03:25) The survey received thousands of responses from many countries, showcasing global reach.(04:49) The survey process involves multiple steps, from question selection to report creation.(09:00) Many users expect to increase Airflow usage for revenue-generating or external use cases.(11:04) Experienced users tend to utilize Airflow more for advanced use cases like MLOps.(15:13) UI improvements offer enhanced navigation and error visibility.(18:15) Architectural changes enable new capabilities like remote execution and language support.(19:40) Long-requested features will be available in the new major release.(21:00) Future aspirations include integrating data visualization capabilities into the UI.Resources Mentioned:Tamara Fingerlinhttps://www.linkedin.com/in/tamara-janina-fingerlin/Astronomer | LinkedIn https://www.linkedin.com/company/astronomer/Astronomer | Websitehttps://www.astronomer.ioApache Airflowhttps://airflow.apache.org/2025 State of Airflow Webinarhttps://www.astronomer.io/airflow/state-of-airflow/Airflow Slackhttps://apache-airflow-slack.herokuapp.com/Astronomer Presents: Exploring Apache Airflow® 3 Roadshowshttps://www.astronomer.io/events/roadshow/https://www.astronomer.io/events/roadshow/london/https://www.astronomer.io/events/roadshow/new-york/https://www.astronomer.io/events/roadshow/sydney/https://www.astronomer.io/events/roadshow/san-francisco/https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

The Software Risk That Affects Everyone and How To Address It with Michael Winser and Jarek Potiuk

Mar 20th, 2025 4:37 AM

The security of open-source software is a growing concern, especially as dependencies and regulations become more complex, making it essential to understand how to manage software supply chains effectively. In this episode, we sit down with Michael Winser, Co-Founder at Alpha-Omega and Security Strategy Ambassador at Eclipse Foundation, and Jarek Potiuk, Member of the Security Committee at the Apache Software Foundation, to discuss the challenges of securing Airflow’s dependencies, the evolving landscape of open-source security and how contributors can help strengthen the ecosystem. Key Takeaways:(02:43) Jarek quit his full-time engineer position and uses Airflow as a freelancer. (04:32) Michael finds happiness in having meaningful work with open-source security.(07:01) Software supply chain security focuses on correctness, integrity and availability.(08:44) Airflow’s 790 dependencies present a unique security challenge.(09:43) Airflow’s security team has significantly improved its vulnerability response.(10:22) The transition to Airflow 3 emphasizes enterprise security readiness.(16:20) The ‘Three Fs’ approach: fix it, fork it, or forget it.(18:45) Dependency health is often more critical than fixing known vulnerabilities.(23:32) The ‘Three Fs’ in action. (26:26) Open-source contributors play a key role in supply chain security.Resources Mentioned:Michael Winser -https://www.linkedin.com/in/michaelw/Jarek Potiuk - https://www.linkedin.com/in/jarekpotiuk/Apache Airflow -https://airflow.apache.org/Apache Software Foundation | LinkedIn -https://www.linkedin.com/company/the-apache-software-foundation/Apache Software Foundation | Website -https://www.apache.org/Eclipse Foundation | LinkedIn -https://www.linkedin.com/company/eclipse-foundation/Eclipse Foundation | Website -https://www.eclipse.org/org/foundation/OpenSSF Working Groups -https://openssf.org/community/openssf-working-groups/Astronomer Roadshow: Exploring Apache Airflow 3 | Londonhttps://www.astronomer.io/events/roadshow/london/Astronomer Roadshow: Exploring Apache Airflow 3 | New Yorkhttps://www.astronomer.io/events/roadshow/new-york/Astronomer Roadshow: Exploring Apache Airflow 3 | Sydneyhttps://www.astronomer.io/events/roadshow/sydney/Astronomer Roadshow: Exploring Apache Airflow 3 | San Franciscohttps://www.astronomer.io/events/roadshow/san-francisco/Astronomer Roadshow: Exploring Apache Airflow 3 | Chicagohttps://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Building Scalable ML Infrastructure at Outerbounds with Savin Goyal

Mar 13th, 2025 4:37 AM

Machine learning is changing fast, and companies need better tools to handle AI workloads. The right infrastructure helps data scientists focus on solving problems instead of managing complex systems. In this episode, we talk with Savin Goyal, Co-Founder and CTO at Outerbounds, about building ML infrastructure, how orchestration makes workflows easier and how Metaflow and Airflow work together to simplify data science. Key Takeaways:(02:02) Savin spent years building AI and ML infrastructure, including at Netflix.(04:05) ML engineering was not a defined role a decade ago.(08:17) Modernizing AI and ML requires balancing new tools with existing strengths.(10:28) ML workloads can be long-running or require heavy computation.(15:29) Different teams at Netflix used multiple orchestration systems for specific needs.(20:10) Stable APIs prevent rework and keep projects moving.(21:07) Metaflow simplifies ML workflows by optimizing data and compute interactions.(25:53) Limited local computing power makes running ML workloads challenging.(27:43) Airflow UI monitors pipelines, while Metaflow UI gives ML insights.(33:13) The most successful data professionals focus on business impact, not just technology.Resources Mentioned:Savin Goyal -https://www.linkedin.com/in/savingoyal/Outerbounds -https://www.linkedin.com/company/outerbounds/Apache Airflow -https://airflow.apache.org/Metaflow - https://metaflow.org/Netflix’s Maestro Orchestration System -https://netflixtechblog.com/maestro-netflixs-workflow-orchestrator-ee13a06f9c78?gi=8e6a067a92e9#:~:text=Maestro%20is%20a%20fully%20managed,data%20between%20different%20storages%2C%20etc.TensorFlow -https://www.tensorflow.org/PyTorch -https://pytorch.org/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Customizing Airflow for Complex Data Environments at Stripe with Nick Bilozerov and Sharadh Krishnamurthy

Mar 6th, 2025 7:20 AM

Keeping data pipelines reliable at scale requires more than just the right tools — it demands constant innovation. In this episode, Nick Bilozerov, Senior Data Engineer at Stripe, and Sharadh Krishnamurthy, Engineering Manager at Stripe, discuss how Stripe customizes Airflow for its needs, the evolution of its data orchestration framework and the transition to Airflow 2. They also share insights on scaling data workflows while maintaining performance, reliability and developer experience. Key Takeaways:(02:04) Stripe’s mission is to grow the GDP of the internet by supporting businesses with payments and data.(05:08) 80% of Stripe engineers use data orchestration, making scalability critical.(06:06) Airflow powers business reports, regulatory needs and ML workflows.(08:02) Custom task frameworks improve dependencies and validation.(08:50) "User scope mode" enables local testing without production impact.(10:39) Migrating to Airflow 2 improves isolation, safety and scalability.(16:40) Monolithic DAGs caused database issues, prompting a service-based shift.(19:24) Frequent Airflow upgrades ensure stability and access to new features.(21:38) DAG versioning and backfill improvements enhance developer experience.(23:38) Greater UI customization would offer more flexibility.Resources Mentioned:Nick Bilozerov -https://www.linkedin.com/in/nick-bilozerov/Sharadh Krishnamurthy -https://www.linkedin.com/in/sharadhk/Apache Airflow -https://airflow.apache.org/Stripe | LinkedIn -https://www.linkedin.com/company/stripe/Stripe | Website -https://stripe.com/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free