Download - Smooth Scaling and Uninterrupted Processing with Apache Kafka ft. Sophie Blee-Goldman

Discover

Podcast Features
Monetization
Podbean App
- Podcast Studio
  Easy-to-use audio recorder app.
- Podcast App
  The best podcast player & podcast app.

Help and Support
Popular Topics

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Advertisers
Enterprise
Pricing
Resources
- Help and Support
- Popular Topics
Discover

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Technology

Smooth Scaling and Uninterrupted Processing with Apache Kafka ft. Sophie Blee-Goldman

2021-03-24

Download Right click and do "save link as"

Availability in Kafka Streams is hard, especially in the face of any changes. Any change to topic metadata or group membership triggers a rebalance. But Kafka Streams struggles even after this stop-the-world rebalance has finished. According to Apache Kafka® Committer and Confluent Software Engineer Sophie Blee-Goldman, this is because a Streams app will generally have some state associated with a given partition, and to move this state from one consumer instance to another requires rebuilding this state from a special backing topic called a changelog, the source of truth for a partition’s state.

Restoring the changelog can take hours, and until the state is ready, Streams can’t do any further processing on that partition. Furthermore, it can’t serve any requests for local state until the local state is “caught up” with the changelog. So scaling out your Streams application results in pretty significant downtime—which is a bummer, especially if the reason for scaling out in the first place was to handle a particularly heavy workload.

To solve the stop-the-world rebalance, we have to find a way to safely assign partitions so we can be confident that they’ve been revoked from their previous owner before being given to a new consumer. To solve the scaling out problem in Kafka Streams, we go a step further. When you add a new instance to your Streams application, we won’t immediately assign any stateful partitions to it. Instead, we’ll leave them assigned to their current owner to continue processing and serving queries as usual. During this time, the new instance will start to “warm up” the local state in the background; it starts consuming from the changelog and building up the local state. We then follow a similar pattern as in cooperative rebalancing, and issue a follow-up rebalance.

In KIP-441, we call these probing rebalances. Every so often (i.e., 10 minutes by default), we trigger a rebalance. In the member’s subscription metadata that it sends to the group leader, each member encodes the current status of its local state. We use the changelog lag as a measure of how “caught up” a partition is. During a rebalance, only instances that are completely caught up are allowed to own stateful tasks; everything else must first warm up the state. So long as there is some task still warming up on a node, we will “probe” with rebalances until it’s ready.

EPISODE LINKS