In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.
Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.
The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.
This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation.
To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer.
Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more.
ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles.
EPISODE LINKS
Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates
A Special Announcement from Streaming Audio
How to use Data Contracts for Long-Term Schema Management
How to use Python with Apache Kafka
Migrate Your Kafka Cluster with Minimal Downtime
Real-Time Data Transformation and Analytics with dbt Labs
What is the Future of Streaming Data?
What can Apache Kafka Developers learn from Online Gaming?
Apache Kafka 3.4 - New Features & Improvements
How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems
What is Data Democratization and Why is it Important?
Git for Data: Managing Data like Code with lakeFS
Using Kafka-Leader-Election to Improve Scalability and Performance
Real-Time Machine Learning and Smarter AI with Data Streaming
The Present and Future of Stream Processing
Top 6 Worst Apache Kafka JIRA Bugs
Learn How Stream-Processing Works The Simplest Way Possible
Building and Designing Events and Event Streams with Apache Kafka
Rethinking Apache Kafka Security and Account Management
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
Black Wolf Feed (Chapo Premium Feed Bootleg)
Bannon`s War Room