The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

https://feeds.cohostpodcasting.com/yiLVF7xu
19 Followers 74 Episodes Claim Ownership
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: ht...
View more

Episode List

Building an End-to-End Data Observability System at Netflix with Joseph Machado

May 15th, 2025 4:00 AM

Building reliable data pipelines starts with maintaining strong data quality standards and creating efficient systems for auditing, publishing and monitoring. In this episode, we explore the real-world patterns and best practices for ensuring data pipelines stay accurate, scalable and trustworthy.Joseph Machado, Senior Data Engineer at Netflix, joins us to share practical insights gleaned from supporting Netflix’s Ads business as well as over a decade of experience in the data engineering space. He discusses implementing audit publish patterns, building observability dashboards, defining in-band and separate data quality checks, and optimizing data validation across large-scale systems.Key Takeaways:.(03:14) Supporting data privacy and engineering efficiency within data systems.(10:41) Validating outputs with reconciliation checks to catch transformation issues.(16:06) Applying standardized patterns for auditing, validating and publishing data.(19:28) Capturing historical check results to monitor system health and improvements.(21:29) Treating data quality and availability as separate monitoring concerns.(26:26) Using containerization strategies to streamline pipeline executions.(29:47) Leveraging orchestration platforms for better visibility and retry capability.(31:59) Managing business pressure without sacrificing data quality practices.(35:46) Starting simple with quality checks and evolving toward more complex frameworks.Resources Mentioned:Joseph Machadohttps://www.linkedin.com/in/josephmachado1991/Netflix | LinkedInhttps://www.linkedin.com/company/netflix/Netflix | Websitehttps://www.netflix.com/browseStart Data Engineeringhttps://www.startdataengineering.com/Apache Airflowhttps://airflow.apache.org/dbt Labshttps://www.getdbt.com/Great Expectationshttps://greatexpectations.io/https://www.astronomer.io/events/roadshow/london/https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/https://www.astronomer.io/events/roadshow/san-francisco/https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Why Developer Experience Shapes Data Pipeline Standards at Next Insurance with Snir Israeli

May 8th, 2025 5:50 AM

Creating consistency across data pipelines is critical for scaling engineering teams and ensuring long-term maintainability.In this episode, Snir Israeli, Senior Data Engineer at Next Insurance, shares how enforcing coding standards and investing in developer experience transformed their approach to data engineering. He explains how implementing automated code checks, clear documentation practices and a scoring system helped drive alignment across teams, improve collaboration and reduce technical debt in a fast-growing data environment.Key Takeaways:(02:59) Inconsistencies in code style create challenges for collaboration and maintenance.(04:22) Programmatically enforcing rules helps teams scale their best practices.(08:55) Performance improvements in data pipelines lead to infrastructure cost savings.(13:22) Developer experience is essential for driving adoption of internal tools.(19:44) Dashboards can operationalize standards enforcement and track progress over time.(22:49) Standardization accelerates onboarding and reduces friction in code reviews.(25:39) Linting rules require ongoing maintenance as tools and platforms evolve.(27:47) Starting small and involving the team leads to better adoption and long-term success.Resources Mentioned:Snir Israelihttps://www.linkedin.com/in/snir-israeli/Next Insurance | LinkedInhttps://www.linkedin.com/company/nextinsurance/Next Insurance | Websitehttps://www.nextinsurance.com/Apache Airflowhttps://airflow.apache.org/https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Data Quality and Observability at Tekmetric with Ipsa Trivedi

May 1st, 2025 4:00 AM

Airflow’s adaptability is driving Tekmetric’s ability to unify complex data workflows, deliver accurate insights and support both internal operations and customer-facing services — all within a rapidly growing startup environment.In this episode, Ipsa Trivedi, Lead Data Engineer at Tekmetric, shares how her team is standardizing pipelines while supporting unique customer needs. She explains how Airflow enables end-to-end data services, simplifies orchestration across varied sources and supports scalable customization. Ipsa also highlights early wins with Airflow, its intuitive UI and the team's roadmap toward data quality, observability and a future self-serve data platform.Key Takeaways:(02:26) Powering auto shops nationwide with a unified platform.(05:17) A new data team was formed to centralize and scale insights.(07:23) Flexible, open source and made to fit — Airflow wins.(10:42) Pipelines handle anything from email to AWS.(12:15) Custom DAGs fit every team’s unique needs.(17:01) Data quality checks are built into the plan.(18:17) Self-serve data mesh is the end goal.(19:59) Airflow now fits so well, there's nothing left on the wishlist.Resources Mentioned:Ipsa Trivedihttps://www.linkedin.com/in/ipsatrivedi/Tekmetric | LinkedInhttps://www.linkedin.com/company/tekmetric/Tekmetric | Websitehttps://www.tekmetric.com/Apache Airflowhttps://airflow.apache.org/AWS RDShttps://aws.amazon.com/free/database/?trk=fc551e06-56b0-418c-9ddd-5c9dba18569b&sc_channel=ps&ef_id=CjwKCAjwzMi_BhACEiwAX4YZULS4jV2Xpnpcac_Q3eS9BAg-klKUDyCt6XSdOul8BLHkmWzFFh4NXRoCGhQQAvD_BwE:G:s&s_kwcid=AL!4422!3!548989592596!e!!g!!amazon%20sql%20database!11543056228!112002958549&gclid=CjwKCAjwzMi_BhACEiwAX4YZULS4jV2Xpnpcac_Q3eS9BAg-klKUDyCt6XSdOul8BLHkmWzFFh4NXRoCGhQQAvD_BwEAstro by Astronomerhttps://www.astronomer.io/product/https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Introducing Apache Airflow® 3 with Vikram Koka and Jed Cunningham

Apr 24th, 2025 4:00 AM

The Airflow 3.0 release marks a significant leap forward in modern data orchestration, introducing architectural upgrades that improve scalability, flexibility and long-term maintainability.In this episode, we welcome Vikram Koka, Chief Strategy Officer at Astronomer, and Jed Cunningham, Principal Software Engineer at Astronomer, to discuss the architectural foundations, new features and future implications of this milestone release. They unpack the rationale behind DAG versioning and task execution interface, explain how Airflow now integrates more seamlessly within broader data ecosystems and share how these changes lay the groundwork for multi-cloud deployments, language-agnostic workflows and stronger enterprise security.Key Takeaways:(02:28) Modern orchestration demands new infrastructure approaches.(05:02) Removing legacy components strengthens system stability.(06:26) Major releases provide the opportunity to reduce technical debt.(08:31) Frontend and API modernization enable long-term adaptability.(09:36) Event-based triggers expand integration possibilities.(11:54) Version control improves visibility and execution reliability.(14:57) Centralized access to workflow definitions increases flexibility.(21:49) Decoupled architecture supports distributed and secure deployments.(26:17) Community collaboration is essential for sustainable growth.Resources Mentioned:Astronomer Websitehttps://www.astronomer.ioApache Airflowhttps://airflow.apache.org/Git Bundlehttps://git-scm.com/book/en/v2/Git-Tools-BundlingFastAPIhttps://fastapi.tiangolo.com/Reacthttps://react.dev/https://www.astronomer.io/events/roadshow/london/https://www.astronomer.io/events/roadshow/new-york/https://www.astronomer.io/events/roadshow/sydney/https://www.astronomer.io/events/roadshow/san-francisco/https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Airflow in Action: Powering Instacart's Complex Ecosystem

Apr 17th, 2025 4:00 AM

The evolution of data orchestration at Instacart highlights the journey from fragmented systems to robust, standardized infrastructure. This transformation has enabled scalability, reliability and democratization of tools for diverse user personas.In this episode, we’re joined by Anant Agarwal, Software Engineer at Instacart, who shares insights into Instacart's Airflow journey, from its early adoption in 2019 to the present-day centralized cluster approach. Anant discusses the challenges of managing disparate clusters, the implementation of remote executors, and the strategic standardization of infrastructure and DAG patterns to streamline workflows.Key Takeaways:(03:49) The impact of external events on business growth and technological evolution.(04:31) Challenges of managing decentralized systems across multiple teams.(06:14) The importance of standardizing infrastructure and processes for scalability.(09:51) Strategies for implementing efficient and repeatable deployment practices.(12:17) Addressing diverse user personas with tailored solutions.(14:47) Leveraging remote execution to enhance flexibility and scalability.(18:36) Benefits of transitioning to a centralized system for organization-wide use.(20:57) Maintaining an upgrade cadence to stay aligned with the latest advancements.(23:35) Anticipation for new features and improvements in upcoming software versions.Resources Mentioned:Anant Agarwalhttps://www.linkedin.com/in/anantag/Instacart | LinkedInhttps://www.linkedin.com/company/instacart/Instacart | Websitehttps://www.instacart.comApache Airflowhttps://airflow.apache.org/AWS Amazonhttps://aws.amazon.com/ecs/Terraformhttps://www.terraform.io/https://www.astronomer.io/events/roadshow/london/https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/https://www.astronomer.io/events/roadshow/san-francisco/https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free