Coding Blocks

View the show notes on the web:
https://www.codingblocks.net/episode237

In the past couple of episodes, we'd gone over what Apache Kafka is and along the way we mentioned some of the pains of managing and running Kafka clusters on your own. In this episode, we discuss some of the ways you can offload those responsibilities and focus on writing streaming applications. Along the way, Joe does a mighty fine fill-in for proper noun pronunciation and Allen does a southern auctioneer-style speed talk.

Reviews

As always, thank you for leaving us a review - we really do appreciate them!

From iTunes: Abucr7

Upcoming Events

Atlanta Dev Con
September 7th, 2024
https://www.atldevcon.com/

DevFest Central Florida on September 28th, 2024
Interested? Submit your talk proposal here:
https://sessionize.com/devfest-florida-orlando-2024/

Kafka Compatible and Kafka Functional Alternatives

Why? Because running any type of infrastructure requires time, knowledge, and blood, sweat and tears

Confluent

WarpStream

  • https://www.warpstream.com/
  • "WarpStream is an Apache Kafka® compatible data streaming platform built directly on top of object storage: no inter-AZ bandwidth costs, no disks to manage, and infinitely scalable, all within your VPC"
  • ZERO disks to manage
  • 10x cheaper than running Kafka
  • Agents stream data directly to and from object storage with no buffering on local disks and no data tiering.
  • Create new serverless “Virtual Clusters” in our control plane instantly
  • Support different environments, teams, or projects without managing any dedicated infrastructure
  • Things you won't have to do with WarpStream
    • Upscale a cluster that is about to run out of space
    • Figure out how to restore quorum in a Zookeeper cluster or Raft consensus group
    • Rebalance partitions in a cluster
  • "WarpStream is protocol compatible with Apache Kafka®, so you can keep using all your favorite tools and software. No need to rewrite your application or use a proprietary SDK. Just change the URL in your favorite Kafka client library and start streaming!"
  • Never again have to choose between reliability and your budget. WarpStream costs the same regardless of whether you run your workloads in a single availability zone, or distributed across multiple
  • WarpStream's unique cloud native architecture was designed from the ground up around the cheapest and most durable storage available in the cloud: commodity object storage
  • WarpStream agents use object storage as the storage layer and the network layer, side-stepping interzone bandwidth costs entirely
  • Can be run in BYOC (bring your own cloud) or in Serverless
    • BYOC - you provide all the compute and storage - the only thing that WarpStream provides is the control plane
      • Data never leaves your environment
    • Serverless - fully managed by WarpStream in AWS - will automatically scale for you even down to nothing!
  • Can run in AWS, GCP and Azure
  • Agents are also S3 compatible so can run with S3 compatible storage such as Minio and others

RedPanda

  • Redpanda is a slimmed down native Kafka protocol compliant drop-in replacement for Kafka
  • There's even a Redpanda Connect!
  • It's main differentiator is performance, it's cheaper and faster

Apache Pulsar

  • Similar to Kafka, but changes the abstraction on storage to allow more flexibility on IO
  • Has a Kafka compliant wrapper for interchangability
  • Simple data offload functionality to S3 or GCS
  • Multi tenancy
  • Geo replication

Cloud alternatives

Tip of the Week

  • Chord AI is an Android/iOS app that uses AI to figure out the chords for a song. This is really useful if you just want to get the quick jist of a song to play along with. The base version is free, and has a few different integration options (YouTube, Spotify, Apple Music Local Files for me) and it uses your phones microphone and a little AI magic to figure it out. It even shows you how to play the chords on guitar or piano. The free version gets you basic chords, but you can pay $8.99 a month to get more advanced/frequent chords.
    https://www.chordai.net/
  • Pandas is nearly as good, if not better than SQL for exploring data
    https://pandas.pydata.org/
  • Another tip for displaying in Jupyter notebooks - to HTML() your dataframes to show the full column data
    https://www.geeksforgeeks.org/how-to-render-pandas-dataframe-as-html-table/
  • Take photos or video and convert them into 3d models
    https://lumalabs.ai/luma-api

Topics, Partitions, and APIs oh my! This episode we're getting further into how Apache Kafka works and its use cases. Also, Allen is staying dry, Joe goes for broke, and Michael (eventually) gets on the right page.

The full show notes are available on the website at https://www.codingblocks.net/episode236

News

  • Thanks for the reviews! angingjellies and Nick Brooker
    • Please leave us a review! (/review)
  • Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com)

Kafka Topics

  • They are partitioned - this means they are distributed (or can be) across multiple Kafka brokers into "buckets"
  • New events written to Kafka are appended to partitions
    • The distribution of data across brokers is what allows Kafka to scale so well as data can be written to and read from many brokers simultaneously
  • Events with the same key are written to the same partition as the original event
    • Kafka guarantees reads of events within a partition are always read in the order that they were written
  • For fault tolerance and high availability, topics can be replicated…even across regions and data centers
    • NOTE: If you're using a cloud provider, know that this can be very costly as you pay for inbound and outbound traffic across regions and availability zones
    • Typical replication configurations for production setups are 3 replicas

Kafka APIS

  • Admin API - used for managing and inspecting topics, brokers, and other Kafka objects
  • Producer API - used to write events to Kafka topics
  • Consumer API - used to read data from Kafka topics
  • Kafka Streams API - the ability to implement stream processing applications/microservices. Some of the key functionality includes functions for transformations, stateful operations like aggregations, joins, windowing, and more
    • In the Kafka streams world, these transformations and aggregations are typically written to other topics (in from one topic, out to one or more other topics)
    • Kafka Connect API - allows for the use of reusable import and export connectors that usually connect external systems. These connectors allow you to gather data from an external system (like a database using CDC) and write that data to Kafka. Then you could have another connector that could push that data to another system OR it could be used for transforming data in your streams application
      • These connectors are referred to as Sources and Sinks in the connector portfolio (confluent.io)
      • Source - gets data from an external system and writes it to a Kafka topic
      • Sink - pushes data to an external system from a Kafka topic

Use Cases

  • Message queue - usually talking about replacing something like ActiveMQ or RabbitMQ
  • Message brokers are often used for responsive types of processing, decoupling systems, etc. - Kafka is usually a great alternative that scales, generally has faster throughput, and offers more functionality
  • Website activity tracking - this was one of the very first use cases for Kafka - the ability to rebuild user actions by recording all the user activities as events
  • How and why Kafka was developed (LinkedIn)
    • Typically different activity types would be written to different topics - like web page interactions to one topic and searches to another
  • Metrics - aggregating statistics from distributed applications
  • Log aggregation - some use Kafka for storage of event logs rather than using something like HDFS or a file server or cloud storage - but why? Because using Kafka for the event storage abstracts away the events from the files
  • Stream processing - taking events in and further enriching those events and publishing them to new topics
  • Event sourcing - using Kafka to store state changes from an application that are used to replay the current state of an object or system
  • Commit log - using Kafka as an external commit log is a way for synchronizing data between distributed systems, or help rebuild the state in a failed system

https://youtu.be/IuUDRU9-HRk

Tip of the Week

  • Rémi Gallego is a music producer who makes music under a variety of names like The Algorithm and Boucle Infini, almost all of it is instrumental Synthwave with a hard-rock edge. They also make a lot of video game music, including 2 of my favorite game soundtracks of all time "The Last Spell" and "Hell is for Demons" (YouTube)
  • Did you know that the Kubernetes-focused TUI we've raved about before can be used to look up information about other things as well, like :helm and :events. Events is particularly useful for figuring out mysteries. You can see all the "resources" available to you with "?". You might be surprised at everything you see (pop-eye, x-ray, and monitoring)
  • WarpStream is an S3 backed, API compliant Kafka Alternative. Thanks MikeRg! (warpstream.com)
  • Cloudflare's trillion message Kafka setup, thanks Mikerg! (blog.bytebytego.com)
  • Want the power and flexibility of jq, but for yaml? Try yq! (gitbook.io)
  • Zenith is terminal graphical metrics for your *nix system written in Rust, thanks MikeRg! (github.com)
  • 8 Big (O)Notation Every Developer should Know (medium.com)
  • Another Git cheat sheet (wizardzines.com)

Direct download: coding-blocks-episode-236.mp3
Category:Software Development -- posted at: 6:50pm EDT

1