The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

https://feeds.cohostpodcasting.com/yiLVF7xu
19 Followers 74 Episodes Claim Ownership
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: ht...
View more

Episode List

Harnessing Airflow for Data-Driven Policy Research at CSET with Jennifer Melot

Feb 27th, 2025 6:15 PM

Turning complex datasets into meaningful analysis requires robust data infrastructure and seamless orchestration. In this episode, we’re joined by Jennifer Melot, Technical Lead at the Center for Security and Emerging Technology (CSET) at Georgetown University, to explore how Airflow powers data-driven insights in technology policy research. Jennifer shares how her team automates workflows to support analysts in navigating complex datasets. Key Takeaways:(02:04) CSET provides data-driven analysis to inform government decision-makers.(03:54) ETL pipelines merge multiple data sources for more comprehensive insights.(04:20) Airflow is central to automating and streamlining large-scale data ingestion.(05:11) Larger-scale databases create challenges that require scalable solutions.(07:20) Dynamic DAG generation simplifies Airflow adoption for non-engineers.(12:13) DAG Factory and dynamic task mapping can improve workflow efficiency.(15:46) Tracking data lineage helps teams understand dependencies across DAGs.(16:14) New Airflow features enhance visibility and debugging for complex pipelines.Resources Mentioned:Jennifer Melot -https://www.linkedin.com/in/jennifer-melot-aa710144/Center for Security and Emerging Technology (CSET) -https://www.linkedin.com/company/georgetown-cset/Apache Airflow -https://airflow.apache.org/Zenodo -https://zenodo.org/OpenLineage -https://openlineage.io/Cloud Dataplex -https://cloud.google.com/dataplexThanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Hybrid Testing Solutions for Autonomous Driving at Bosch with Jens Scheffler and Christian Schilling

Feb 13th, 2025 5:10 AM

Testing autonomous vehicles demands precision, scalability and powerful orchestration tools — enter Apache Airflow, a key component of Bosch’s cutting-edge testing framework. In this episode, we sit down with Jens Scheffler, Test Execution Cluster Technical Architect, and Christian Schilling, Product Owner Open Loop Testing Automated Driving, both at Bosch, to explore how Bosch harnesses Airflow to streamline complex testing scenarios. They share insights on scaling workflows, integrating hybrid infrastructures and ensuring vehicle safety through rigorous automated testing.Key Takeaways:(01:35) Airflow orchestrates millions of test hours for autonomous systems.(03:15) Jens scales distributed systems with Kubernetes for job orchestration.(06:02) Airflow runs hundreds of tests simultaneously.(06:44) Virtual testing reduces costs and on-road trials.(12:19) Unified APIs and GUIs streamline operations.(15:05) Self-service setups empower Bosch teams.(18:00) Physical hardware integration ensures real-world timing.(20:30) Dynamic task mapping scales workflows efficiently.(25:22) Open-source contributions improve stability.(31:06) Edge and Celery executors power Bosch's hybrid scheduling.Resources Mentioned:Jens Scheffler -https://www.linkedin.com/in/jens-scheffler/Christian Schilling -https://www.linkedin.com/in/christian-schilling-a5078831a/Bosch -https://www.linkedin.com/company/bosch/Apache Airflow -https://airflow.apache.org/Kubernetes -https://kubernetes.ioGitHub -https://github.comEdge Executor -https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.htmlThanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Overcoming Airflow Scaling Challenges at Monzo Bank with Jonathan Rainer

Feb 7th, 2025 3:09 AM

Scaling a data orchestration platform to manage thousands of tasks daily demands innovative solutions and strategic problem-solving. In this episode, we explore the complexities of scaling Airflow and the challenges of orchestrating thousands of tasks in dynamic data environments. Jonathan Rainer, Staff Software Engineer at Apollo GraphQL, joins us to share his journey optimizing data pipelines, overcoming UI limitations and ensuring DAG consistency in high-stakes scenarios. Key Takeaways:(03:11) Using Airflow to schedule computation in BigQuery.(07:02) How DAGs with 8,000+ tasks were managed nightly.(08:18) Ensuring accuracy in regulatory reporting for banking.(11:35) Handling task inconsistency and DAG failures with automation.(16:09) Building a service to resolve DAG consistency issues in Airflow.(25:05) Challenges with scaling the Airflow UI for thousands of tasks.(27:03) The role of upstream and downstream task management in Airflow.(37:33) The importance of operational metrics for monitoring Airflow health.(39:19) Balancing new tools with root cause analysis to address scaling issues.(41:35) Why scaling solutions require both technical and leadership buy-in.Resources Mentioned:Jonathan Rainer -https://www.linkedin.com/in/jonathan-rainer/Apollo GraphQL -https://www.linkedin.com/company/apollo-graphql/Apache Airflow -https://airflow.apache.org/BigQuery -https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/bigquery.htmlKubernetes -https://kubernetes.io/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Orchestrating Analytics and AI Workflows at Telia with Arjun Anandkumar

Jan 30th, 2025 5:00 AM

The future of data engineering lies in seamless orchestration and automation. In this episode, Arjun Anandkumar, Data Engineer at Telia, shares how his team uses Airflow to drive analytics and AI workflows. He highlights the challenges of scaling data platforms and how adopting best practices can simplify complex processes for teams across the organization. Arjun also discusses the transformative role of tools like Cosmos and Terraform in enhancing efficiency and collaboration. Key Takeaways:(02:16) Telia operates across the Nordics and Baltics, focusing on telecom and energy services.(03:45) Airflow runs dbt models seamlessly with Cosmos on AWS MWAA.(05:47) Cosmos improves visibility and orchestration in Airflow.(07:00) Medallion Architecture organizes data into bronze, silver and gold layers.(08:34) Task group challenges highlight the need for adaptable workflows.(15:04) Scaling managed services requires trial, error and tailored tweaks.(19:46) Terraform scales infrastructure, while YAML templates manage DAGs efficiently.(20:00) Templated DAGs and robust testing enhance platform management.(24:15) Open-source resources drive innovation in Airflow practices.Resources Mentioned:Arjun Anandkumar -https://www.linkedin.com/in/arjunanand1/?originalSubdomain=dkTelia -https://www.linkedin.com/company/teliacompany/Apache Airflow -https://airflow.apache.org/Cosmos by Astronomer -https://www.astronomer.io/cosmos/Terraform -https://www.terraform.io/Medallion Architecture by Databricks -https://www.databricks.com/glossary/medallion-architectureThanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

The Role of Airflow in Finance Transformation at Etraveli Group with Mihir Samant

Jan 23rd, 2025 10:47 AM

Transforming bottlenecked finance processes into streamlined, automated systems requires the right tools and a forward-thinking approach. In this episode, Mihir Samant, Senior Data Analyst at Etraveli Group, joins us to share how his team leverages Airflow to revolutionize finance automation. With extensive experience in data workflows and a passion for open-source tools, Mihir provides valuable insights into building efficient, scalable systems. We explore the transformative power of Airflow in automating workflows and enhancing data orchestration within the finance domain. Key Takeaways:(02:14) Etraveli Group specializes in selling affordable flight tickets and ancillary services.(03:56) Mihir’s finance automation team uses Airflow to tackle month-end bottlenecks.(06:00) Airflow's flexibility enables end-to-end automation for finance workflows.(07:00) Open-source Airflow tools offer cost-effective solutions for new teams.(08:46) Sensors and dynamic DAGs are pivotal features for optimizing tasks.(13:30) GitSync simplifies development by syncing environments seamlessly.(16:27) Plans include integrating Databricks for more advanced data handling.(17:58) Airflow and Databricks offer multiple flexible methods to trigger workflows and execute SQL queries seamlessly.Resources Mentioned:Mihir Samant -https://www.linkedin.com/in/misamant/?originalSubdomain=caEtraveli Group -https://www.linkedin.com/company/etraveli-group/Apache Airflow -https://airflow.apache.org/Docker -https://www.docker.com/Databricks -https://www.databricks.com/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free