How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey.
Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog—Data Enrichment in Existing Data Pipelines Using Confluent Cloud.
The second project Shufan presents is a sentiment analysis system that gathers data from a given subreddit, then assigns the data a sentiment score. He points out that its results would be hard to duplicate manually by simply reading through a subreddit—you really need the assistance of AI. The project consists of four microservices:
Interesting subreddits that Shufan has analyzed for sentiment include gaming forums before and after key releases; crypto and stock trading forums at various meaningful points in time; and sports-related forums both before the season and several games into it.
EPISODE LINKS
Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates
A Special Announcement from Streaming Audio
How to use Data Contracts for Long-Term Schema Management
How to use Python with Apache Kafka
Next-Gen Data Modeling, Integrity, and Governance with YODA
Migrate Your Kafka Cluster with Minimal Downtime
Real-Time Data Transformation and Analytics with dbt Labs
What is the Future of Streaming Data?
What can Apache Kafka Developers learn from Online Gaming?
Apache Kafka 3.4 - New Features & Improvements
How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems
What is Data Democratization and Why is it Important?
Git for Data: Managing Data like Code with lakeFS
Using Kafka-Leader-Election to Improve Scalability and Performance
Real-Time Machine Learning and Smarter AI with Data Streaming
The Present and Future of Stream Processing
Top 6 Worst Apache Kafka JIRA Bugs
Learn How Stream-Processing Works The Simplest Way Possible
Building and Designing Events and Event Streams with Apache Kafka
Rethinking Apache Kafka Security and Account Management
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
The Unbelivable Truth - Series 1 - 26 including specials and pilot
Lex Fridman Podcast