Alternatives to Administering and Running Apache Kafka
In the past couple of episodes, we’d gone over what Apache Kafka is and along the way we mentioned some of the pains of managing and running Kafka clusters on your own. In this episode, we discuss some of the ways you can offload those responsibilities and focus on writing streaming applications. Along the way, Joe does a mighty fine fill-in for proper noun pronunciation and Allen does a southern auctioneer-style speed talk. View the full show notes here:https://www.codingblocks.net/episode237 Reviews As always, thank you for leaving us a review – we really do appreciate them! From iTunes: Abucr7 Upcoming Events Atlanta Dev ConSeptember 7th, 2024https://www.atldevcon.com/ DevFest Central Florida on September 28th, 2024Interested? Submit your talk proposal here:https://sessionize.com/devfest-florida-orlando-2024/ Kafka Compatible and Kafka Functional Alternatives Why? Because running any type of infrastructure requires time, knowledge, and blood, sweat and tears Confluent https://www.confluent.io/confluent-cloud/pricing/ We’ve personally had good experiences with their Kafka as a service WarpStream https://www.warpstream.com/ “WarpStream is an Apache Kafka® compatible data streaming platform built directly on top of object storage: no inter-AZ bandwidth costs, no disks to manage, and infinitely scalable, all within your VPC” ZERO disks to manage 10x cheaper than running Kafka Agents stream data directly to and from object storage with no buffering on local disks and no data tiering. Create new serverless “Virtual Clusters” in our control plane instantly Support different environments, teams, or projects without managing any dedicated infrastructure Things you won’t have to do with WarpStream Upscale a cluster that is about to run out of space Figure out how to restore quorum in a Zookeeper cluster or Raft consensus group Rebalance partitions in a cluster “WarpStream is protocol compatible with Apache Kafka®, so you can keep using all your favorite tools and software. No need to rewrite your application or use a proprietary SDK. Just change the URL in your favorite Kafka client library and start streaming!” Never again have to choose between reliability and your budget. WarpStream costs the same regardless of whether you run your workloads in a single availability zone, or distributed across multiple WarpStream’s unique cloud native architecture was designed from the ground up around the cheapest and most durable storage available in the cloud: commodity object storage WarpStream agents use object storage as the storage layer and the network layer, side-stepping interzone bandwidth costs entirely Can be run in BYOC (bring your own cloud) or in Serverless BYOC – you provide all the compute and storage – the only thing that WarpStream provides is the control plane Data never leaves your environment Serverless – fully managed by WarpStream in AWS – will automatically scale for you even down to nothing! Can run in AWS, GCP and Azure Agents are also S3 compatible so can run with S3 compatible storage such as Minio and others RedPanda Redpanda is a slimmed down native Kafka protocol compliant drop-in replacement for Kafka There’s even a Redpanda Connect! It’s main differentiator is performance, it’s cheaper and faster Apache Pulsar Similar to Kafka, but changes the abstraction on storage to allow more flexibility on IO Has a Kafka compliant wrapper for interchangability Simple data offload functionality to S3 or GCS Multi tenancy Geo replication Cloud alternatives Google Cloud – PubSub https://cloud.google.com/pubsub Azure – Event Hubs https://azure.microsoft.com/en-us/products/event-hubs AWS – Kinesis https://aws.amazon.com/kinesis/ Tip of the Week Chord AI is an Android/iOS app that uses AI to figure out the chords for a song. This is really useful if you just want to get the quick jist of a song to play along with. The base version is free, and has a few different integration options (YouTube, Spotify, Apple Music Local Files for me) and it uses your phones microphone and a little AI magic to figure it out. It even shows you how to play the chords on guitar or piano. The free version gets you basic chords, but you can pay $8.99 a month to get more advanced/frequent chords.https://www.chordai.net/ Pandas is nearly as good, if not better than SQL for exploring datahttps://pandas.pydata.org/ Another tip for displaying in Jupyter notebooks – to HTML() your dataframes to show the full column datahttps://www.geeksforgeeks.org/how-to-render-pandas-dataframe-as-html-table/ Take photos or video and convert them into 3d modelshttps://lumalabs.ai/luma-api
Nuts and Bolts of Apache Kafka
Topics, Partitions, and APIs oh my! This episode we’re getting further into how Apache Kafka works and its use cases. Also, Allen is staying dry, Joe goes for broke, and Michael (eventually) gets on the right page. The full show notes are available on the website at https://www.codingblocks.net/episode236 News Thanks for the reviews! angingjellies and Nick Brooker Please leave us a review! (/review) Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com) Kafka Topics They are partitioned – this means they are distributed (or can be) across multiple Kafka brokers into “buckets” New events written to Kafka are appended to partitions The distribution of data across brokers is what allows Kafka to scale so well as data can be written to and read from many brokers simultaneously Events with the same key are written to the same partition as the original event Kafka guarantees reads of events within a partition are always read in the order that they were written For fault tolerance and high availability, topics can be replicated…even across regions and data centers NOTE: If you’re using a cloud provider, know that this can be very costly as you pay for inbound and outbound traffic across regions and availability zones Typical replication configurations for production setups are 3 replicas Kafka APIS Admin API – used for managing and inspecting topics, brokers, and other Kafka objects Producer API – used to write events to Kafka topics Consumer API – used to read data from Kafka topics Kafka Streams API – the ability to implement stream processing applications/microservices. Some of the key functionality includes functions for transformations, stateful operations like aggregations, joins, windowing, and more In the Kafka streams world, these transformations and aggregations are typically written to other topics (in from one topic, out to one or more other topics) Kafka Connect API – allows for the use of reusable import and export connectors that usually connect external systems. These connectors allow you to gather data from an external system (like a database using CDC) and write that data to Kafka. Then you could have another connector that could push that data to another system OR it could be used for transforming data in your streams application These connectors are referred to as Sources and Sinks in the connector portfolio (confluent.io) Source – gets data from an external system and writes it to a Kafka topic Sink – pushes data to an external system from a Kafka topic Use Cases Message queue – usually talking about replacing something like ActiveMQ or RabbitMQ** Message brokers are often used for responsive types of processing, decoupling systems, etc. – Kafka is usually a great alternative that scales, generally has faster throughput, and offers more functionality Website activity tracking – this was one of the very first use cases for Kafka – the ability to rebuild user actions by recording all the user activities as events How and why Kafka was developed (LinkedIn) Typically different activity types would be written to different topics – like web page interactions to one topic and searches to another Metrics – aggregating statistics from distributed applications Log aggregation – some use Kafka for storage of event logs rather than using something like HDFS or a file server or cloud storage – but why? Because using Kafka for the event storage abstracts away the events from the files Stream processing – taking events in and further enriching those events and publishing them to new topics Event sourcing – using Kafka to store state changes from an application that are used to replay the current state of an object or system Commit log – using Kafka as an external commit log is a way for synchronizing data between distributed systems, or help rebuild the state in a failed system {"@context":"http:\/\/schema.org\/","@id":"https:\/\/www.codingblocks.net\/podcast\/nuts-and-bolts-of-apache-kafka\/#arve-youtube-iuudru9-hrk666640e41441c664901823","type":"VideoObject","embedURL":"https:\/\/www.youtube-nocookie.com\/embed\/IuUDRU9-HRk?feature=oembed&iv_load_policy=3&modestbranding=1&rel=0&autohide=1&playsinline=0&autoplay=0"} Tip of the Week Rémi Gallego is a music producer who makes music under a variety of names like The Algorithm and Boucle Infini, almost all of it is instrumental Synthwave with a hard-rock edge. They also make a lot of video game music, including 2 of my favorite game soundtracks of all time “The Last Spell” and “Hell is for Demons” (YouTube) Did you know that the Kubernetes-focused TUI we’ve raved about before can be used to look up information about other things as well, like :helm and :events. Events is particularly useful for figuring out mysteries. You can see all the “resources” available to you with “?”. You might be surprised at everything you see (pop-eye, x-ray, and monitoring) WarpStream is an S3 backed, API compliant Kafka Alternative. Thanks MikeRg! (warpstream.com) Cloudflare’s trillion message Kafka setup, thanks Mikerg! (blog.bytebytego.com) Want the power and flexibility of jq, but for yaml? Try yq! (gitbook.io) Zenith is terminal graphical metrics for your *nix system written in Rust, thanks MikeRg! (github.com) 8 Big (O)Notation Every Developer should Know (medium.com) Another Git cheat sheet (wizardzines.com)
Intro to Apache Kafka
We finally start talking about Apache Kafka! Also, Allen is getting acquainted with Aesop, Outlaw is killing clusters, and Joe was paying attention in drama class. The full show notes are available on the website at https://www.codingblocks.net/episode235 News Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com) Intro to Apache Kafka What is it? Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Core capabilities High throughput – Deliver messages at network-limited throughput using a cluster of machines with latencies as low as 2ms. Scalable – Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, and hundreds of thousands of partitions. Elastically expand and contract storage and processing Permanent storage – Store streams of data safely in a distributed, durable, fault-tolerant cluster. High availability – Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions. Ecosystem Built-in stream processing – Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing. Connect to almost anything – Kafka’s out-of-the-box Connect interface integrates with hundreds of event sources and event sinks including Postgres, JMS, Elasticsearch, AWS S3, and more. Client libraries – Read, write, and process streams of events in a vast array of programming languages Large ecosystem of open source tools – Large ecosystem of open source tools: Leverage a vast array of community-driven tooling. Trust and Ease of Use Mission critical – Support mission-critical use cases with guaranteed ordering, zero message loss, and efficient exactly-once processing. Trusted by thousands of organizations – Thousands of organizations use Kafka, from internet giants to car manufacturers to stock exchanges. More than 5 million unique lifetime downloads. Vast user community – Kafka is one of the five most active projects of the Apache Software Foundation, with hundreds of meetups around the world. What is it? Getting data in real-time from event sources like databases, sensors, mobile devices, cloud services, applications, etc. in the form of streams of events. Those events are stored “durably” (in Kafka) for processing, either in real-time or retrospectively, and then routed to various destinations depending on your needs. It’s this continuous flow and processing of data that is known as “streaming data”How can it be used? (some examples) Processing payments and financial transactions in real-time Tracking automobiles and shipments in real time for logistical purposes Capture and analyze sensor data from IoT devices or other equipment To connect and share data from different divisions in a company Apache Kafka as an event streaming platform? It contains three key capabilities that make it a complete streaming platform Can publish and subscribe to streams of events Can store streams of events durably and reliably for as long as necessary (infinitely if you have the storage) To process streams of events in real-time or retrospectively Can be deployed to bare metal, virtual machines or to containers on-prem or in the cloud Can be run self-managed or via various cloud providers as a managed service How does Kafka work? A distributed system that’s composed of servers and clients that communicate using a highly performant TCP protocol Servers Kafka runs as a cluster of one or more servers that can span multiple data centers or cloud regions Brokers – these are a portion of the servers that are the storage layer Kafka Connect – these are servers that constantly import and export data from existing systems in your infrastructure such as relational databases Kafka clusters are highly scalable and fault-tolerant Clients Allows you to write distributed applications that allow to read, write and process streams of events in parallel that are fault-tolerant and scale These clients are available in many programming languages – both the ones provided by the core platform as well as 3rd party clients Concepts Events It’s a record of something that happened – also called a “record” in the documentation Has a key Has a value Has an event timestamp Can have additional metadata Producers and Consumers Producers – these are the client applications that publish/write events to Kafka Consumers – these are the client applications that read/subscribe to events from Kafka Producers and consumers are completely decoupled from each other Topics Events are stored in topics Topics are like folders on a file system – events would be the equivalent of files within that folder Topics are mutli-producer and multi-subscriber There can be zero, one or many producers or subscribers to a topic that write to or read from that topic respectively Unlike many message queuing systems, these events can be read from as many times as necessary because they are not deleted after being consumed Deleting of messages is handled on a per topic configuration that determines how long events are retained Kafka’s performance is not dependent on the amount of data nor the duration of time data is stored, so storing for longer periods is not a problem Resources we Like Why Strimzi moved away from statefulsets (github.com) Tip of the Week Flipper Zero is a multi-functional interaction device mixed with a Tamagotchi. It has a variety of IO options built in, RFID, NFC, GPIO, Bluetooth, USB, and a variety of low-voltage pins like you’d see on an Arduino. Using the device upgrades the dolphin, encouraging you to try new things…and it’s all open-source with a vibrant community behind it. (shop.flipperzero.one) 7 cool and useful things to do with your Flipper Zero Kafka Tui?! Kaskade is a cool-looking Kafka TUI that has got to be better than using the scripts in the build folder that comes with Kafka. (github.com/sauljabin/kaskade) Microstudio is a web-based integrated development environment for making simple games and it’s open source! (microstudio.dev) Bing Copilot has a number of useful prompts (bing.com) Designer (photos) Vacation Planner Cooking assistant Fitness trainer Sharing metrics between projects in GCP, Azure, and maybe AWS??? GCP (projects): (cloud.google.com) Azure (resource groups or subscriptions): (learn.microsoft.com) AWS (multiple accounts): (docs.aws.amazon.com) Checking wifi in your home – Android Only (play.google.com) Powering POE without running cables (Amazon) Omada specific – cloud vs local hardware (Amazon) How to “shutdown” a Kafka cluster in Kubernetes: kubectl annotate kafka my-kafka-cluster strimzi.io/pause-reconciliation="true" --context=my-context --namespace=my-namespace kubectl delete strimzipodsets my-kafka-cluster --context=my-context --namespace=my-namespace Then to “restart” the cluster: kubectl annotate kafka my-kafka-cluster strimzi.io/pause-reconciliation- --context=my-context --namespace=my-namespace https://github.com/strimzi/proposals/blob/main/031-statefulset-removal.md
StackOverflow AI Disagreements, Kotlin Coroutines and More
Joe Zack was on a brief holiday so Allen and Michael took over the helm for an episode. What would a new episode be without a little something regarding AI, some more love for Kotlin, and a number of excellent tips throughout (as well as at the end of) the episode. Reviews iTunes: ivan.kuchin News Atlanta Dev ConSeptember 7th, 2024https://www.atldevcon.com/ Topics People trying to remove their answers from StackOverflow to not allow OpenAI to use their answers without permission/recognition?https://www.tomshardware.com/tech-industry/artificial-intelligence/stack-overflow-bans-users-en-masse-for-rebelling-against-openai-partnership-users-banned-for-deleting-answers-to-prevent-them-being-used-to-train-chatgpt Obfuscate data dumps with PostgreSQLhttps://github.com/GreenmaskIO/greenmask/ Kotlin Coroutineshttps://kotlinlang.org/docs/coroutines-overview.htmlhttps://kotlinlang.org/docs/coroutine-context-and-dispatchers.html#dispatchers-and-threads Reminded Outlaw of the Cloudflare Workers we mentioned a while backhttps://developers.cloudflare.com/workers/ Please leave us a review!https://www.codingblocks.net/review You can control if YouTube keeps track of your history (at least that you can see) 100 Things You Didn’t Know About Kuberneteshttps://www.devopsinside.com/100-things-you-didnt-know-about-kubernetes-part-1/ Do the IDE AI’s really make you more productive? Random Bits Tesla Las Vegas Loophttps://www.lvcva.com/vegas-loop/ What actually happens when you overfill the oil in a vehicle?https://www.youtube.com/watch?v=VaTbfvzNbxQ Fisker Ocean totalled after a $900 door ding…reallyhttps://jalopnik.com/fisker-ocean-totaled-over-910-door-ding-after-insurer-1851451187 A Ford Mustang painted with the blackest black paint availablehttps://youtu.be/Ll27OkWuE1g Tip of the Week Docker Blog is pretty excellent https://www.docker.com/blog/ Car Research Car reliability informationhttps://www.truedelta.com/ Actual problems logged with car models by yearhttps://www.carcomplaints.com/ Great search engine for finding cars and more metadata about the listing like how long the car has been listedhttps://caredge.com/ Utilizing wood sheet goods by utilizing cut lists https://www.opticutter.com/cut-list-optimizer Docker’s chicken-n-egg problem Use a multi-stage Dockerfile where an earlier stage has the tools you need Manually dearmor a PGP public key(Hint: it’s the opposite of: https://superuser.com/questions/764465/how-to-ascii-armor-my-public-key-without-installing-gpg) Download the file using the server suggested name With wget …--content-dispositionhttps://man7.org/linux/man-pages/man1/wget.1.html Wth curl …-JO-J, –remote-header-name-O, –remote-namehttps://curl.se/docs/manpage.html#-J
Llama 3 is Here, Spending Time on Environmental Setup and More
In this episode Joe introduces us to more security items you should be aware of in the world of CWE’s, Michael bends to the will of Joe and Allen in his favorite portion of the show, and Allen pontificates on the time spent setting up IDE’s and environments. Reviews – Thank You! iTunes: Vlad Bezden, Mom in VA, Make1977 Spotify: chutney3000, Xuraith Upcoming Events Atlanta Dev ConSeptember 7th, 2024https://www.atldevcon.com/ Topics Open Telemetry The backend mattershttps://opentelemetry.io/ecosystem/integrations/ Some backends are more fully featured than others Splunk Trace Analyzerhttps://docs.splunk.com/observability/en/apm/apm-spans-traces/trace-analyzer.html Google Trace Explorerhttps://cloud.google.com/trace/docs/finding-traces Azure OTel Guidehttps://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=aspnetcore AWS OTel Informationhttps://aws.amazon.com/otel/ The processor can decouple youhttps://opentelemetry.io/docs/collector/configuration/#processors CNCF – Cloud Native Computing Foundation If you’re working in a cloud environment, you should know the projects herehttps://www.cncf.io/projects/ Super cool visualization tool for the projectshttps://landscape.cncf.io/ Llama 3 – the next version of Meta’s AI engine “Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications”https://llama.meta.com/llama3/ Environmental concerns over the processing required for AI Power requirements for processing some of the LLM’shttps://www.nnlabs.org/power-requirements-of-large-language-models/ The Microsoft underwater datacenterhttps://news.microsoft.com/source/features/sustainability/project-natick-underwater-datacenter/ Setting up IDE’s and environments IDE vs old school debugging Setup can require a significant amount of time Is it worth it? What if you’re just working on a bug? Security Resources What’s the difference between CWE and OWASP? CWE (Common Weakness Enumeration) is a community-developed list of common software and hardware weaknesses. It’s similar to OWASP, but older (1999 vs 2001) and more general – including non web apps and (more recently) hardware The infamous “NVD” database links CVE (Common Vulnerabilities and Exposures) to CWEhttps://nvd.nist.gov/vuln/detail/CVE-2021-44228https://cwe.mitre.org/top25/archive/2023/2023_trends.html Tips Pre-warning – probably wouldn’t recommend installing this! Saw a cool Windows utility called “Windrecorder” that records video and text from your desktop, and lets you rewind and search. Uses ffmpeg to record screen into small 15-minute fragment files Search(by window titles, text keywords, or descriptions of images) Everything happens should only on your computer Cons: No instant rewind (have to be out of the window), Storage is unencrypted, Not much LLM / ML fancy stuff…and securityhttps://tonoko.notion.site/I-made-an-open-source-app-to-rewind-search-everything-happened-on-your-screen-on-Windows-184d1a9d5edb494dba0c2f46d311ec5chttps://github.com/yuka-friends/Windrecorder MacOS’s Spotlight is more powerful than you maybe knewhttps://www.intego.com/mac-security-blog/spotlight-secrets-15-ways-to-use-spotlight-on-your-mac/ https://beebom.com/spotlight-tips-tricks/ If you’re grep command isn’t working like you thought it should, you might be a victim of content getting kicked out of the buffergrep --line-buffered iOS – get text from imageshttps://support.apple.com/guide/iphone/use-live-text-iphcf0b71b0e/ios