Coding Blocks is signing out for now, in this episode we’ll talk about what’s happening and why. We have had an amazing run, far better than we ever expected. Also, Joe recommends 50 games, Allen goes for the gold, and Outlaw is totally normal. (And we’re not crying you’re crying!)
Thank you for the support over the last 11 (!!!) years. It's been a wild ride, and the last thing we ever expected when starting a tech podcast was getting to meet so many fantastic people.
UFO 50 is an odd collection of 50 pseudo-retro video games made by a small group of game developers, most notably including Derek Yu of Spelunky. It's a unique and specific experience that reminds me of spending the night at your friend's house who had some console gaming system that you'd only ever heard rumors about. The games seem small and simple at first blush, but there is surprising depth. Favorites so far are Kick Club, Avianos, Attactics, and Mortol. (Steam)
Use JSDoc annotations to make VSCode "understand" your data (jsdoc.app)
Can you change your password without needing current password? (askubuntu.com)
Did you know you can use VS Code for interactive rebasing?
How to enable VS Code Interactive Editor (StackOverflow)
Grab your headphones because it's water cooler time! In this episode we're catching up on feedback, putting our skills to the test, and wondering what we're missing. Plus, Allen's telling it how it is, Outlaw is putting it all together and Joe is minding the gaps!
It's Water Cooler Time! We've got a variety of topics today, and also Outlaw's lawyering up, Allen can read QR codes now, and Joe is looking at second careers.
Naming things is important, gives them power…but also the power to defeat them!
Don't make any one specific technology your hammer
Client libraries that completely change with server upgrades
What's the most important or relevant thing to learn as a developer now?
Do you research or learn on vacation?
Tip of the Week
Curated, High-Quality Stories, Essays, Editorials, and Podcasts based around Software Engineering. It's more polished and less experimental than PagedOut (Github) Also, there's a new Paged Out, complete with downloadable art. It's more avant-garde than GIthub's Readme project, featuring articles on Art, Cryptography, Demoscenes, and Reverse Engineering. (pagedout.institute)
Travel Router - Extensible Authentication Protocol (EAP) is used to pass the authentication information between the supplicant (the Wi-Fi workstation) and the authentication server (Microsoft IAS or other) (Amazon)
Generative AI for beginners - "Learn the fundamentals of building Generative AI applications with our 18-lesson comprehensive course by Microsoft Cloud Advocates."
Microsoft has a course for getting into generative AI! (microsoft.github.io)
In the past couple of episodes, we'd gone over what Apache Kafka is and along the way we mentioned some of the pains of managing and running Kafka clusters on your own. In this episode, we discuss some of the ways you can offload those responsibilities and focus on writing streaming applications. Along the way, Joe does a mighty fine fill-in for proper noun pronunciation and Allen does a southern auctioneer-style speed talk.
Reviews
As always, thank you for leaving us a review - we really do appreciate them!
"WarpStream is an Apache Kafka® compatible data streaming platform built directly on top of object storage: no inter-AZ bandwidth costs, no disks to manage, and infinitely scalable, all within your VPC"
ZERO disks to manage
10x cheaper than running Kafka
Agents stream data directly to and from object storage with no buffering on local disks and no data tiering.
Create new serverless “Virtual Clusters” in our control plane instantly
Support different environments, teams, or projects without managing any dedicated infrastructure
Things you won't have to do with WarpStream
Upscale a cluster that is about to run out of space
Figure out how to restore quorum in a Zookeeper cluster or Raft consensus group
Rebalance partitions in a cluster
"WarpStream is protocol compatible with Apache Kafka®, so you can keep using all your favorite tools and software. No need to rewrite your application or use a proprietary SDK. Just change the URL in your favorite Kafka client library and start streaming!"
Never again have to choose between reliability and your budget. WarpStream costs the same regardless of whether you run your workloads in a single availability zone, or distributed across multiple
WarpStream's unique cloud native architecture was designed from the ground up around the cheapest and most durable storage available in the cloud: commodity object storage
WarpStream agents use object storage as the storage layer and the network layer, side-stepping interzone bandwidth costs entirely
Can be run in BYOC (bring your own cloud) or in Serverless
BYOC - you provide all the compute and storage - the only thing that WarpStream provides is the control plane
Data never leaves your environment
Serverless - fully managed by WarpStream in AWS - will automatically scale for you even down to nothing!
Can run in AWS, GCP and Azure
Agents are also S3 compatible so can run with S3 compatible storage such as Minio and others
RedPanda
Redpanda is a slimmed down native Kafka protocol compliant drop-in replacement for Kafka
There's even a Redpanda Connect!
It's main differentiator is performance, it's cheaper and faster
Apache Pulsar
Similar to Kafka, but changes the abstraction on storage to allow more flexibility on IO
Has a Kafka compliant wrapper for interchangability
Chord AI is an Android/iOS app that uses AI to figure out the chords for a song. This is really useful if you just want to get the quick jist of a song to play along with. The base version is free, and has a few different integration options (YouTube, Spotify, Apple Music Local Files for me) and it uses your phones microphone and a little AI magic to figure it out. It even shows you how to play the chords on guitar or piano. The free version gets you basic chords, but you can pay $8.99 a month to get more advanced/frequent chords. https://www.chordai.net/
Topics, Partitions, and APIs oh my! This episode we're getting further into how Apache Kafka works and its use cases. Also, Allen is staying dry, Joe goes for broke, and Michael (eventually) gets on the right page.
Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com)
Kafka Topics
They are partitioned - this means they are distributed (or can be) across multiple Kafka brokers into "buckets"
New events written to Kafka are appended to partitions
The distribution of data across brokers is what allows Kafka to scale so well as data can be written to and read from many brokers simultaneously
Events with the same key are written to the same partition as the original event
Kafka guarantees reads of events within a partition are always read in the order that they were written
For fault tolerance and high availability, topics can be replicated…even across regions and data centers
NOTE: If you're using a cloud provider, know that this can be very costly as you pay for inbound and outbound traffic across regions and availability zones
Typical replication configurations for production setups are 3 replicas
Kafka APIS
Admin API - used for managing and inspecting topics, brokers, and other Kafka objects
Producer API - used to write events to Kafka topics
Consumer API - used to read data from Kafka topics
Kafka Streams API - the ability to implement stream processing applications/microservices. Some of the key functionality includes functions for transformations, stateful operations like aggregations, joins, windowing, and more
In the Kafka streams world, these transformations and aggregations are typically written to other topics (in from one topic, out to one or more other topics)
Kafka Connect API - allows for the use of reusable import and export connectors that usually connect external systems. These connectors allow you to gather data from an external system (like a database using CDC) and write that data to Kafka. Then you could have another connector that could push that data to another system OR it could be used for transforming data in your streams application
These connectors are referred to as Sources and Sinks in the connector portfolio (confluent.io)
Source - gets data from an external system and writes it to a Kafka topic
Sink - pushes data to an external system from a Kafka topic
Use Cases
Message queue - usually talking about replacing something like ActiveMQ or RabbitMQ
Message brokers are often used for responsive types of processing, decoupling systems, etc. - Kafka is usually a great alternative that scales, generally has faster throughput, and offers more functionality
Website activity tracking - this was one of the very first use cases for Kafka - the ability to rebuild user actions by recording all the user activities as events
Typically different activity types would be written to different topics - like web page interactions to one topic and searches to another
Metrics - aggregating statistics from distributed applications
Log aggregation - some use Kafka for storage of event logs rather than using something like HDFS or a file server or cloud storage - but why? Because using Kafka for the event storage abstracts away the events from the files
Stream processing - taking events in and further enriching those events and publishing them to new topics
Event sourcing - using Kafka to store state changes from an application that are used to replay the current state of an object or system
Commit log - using Kafka as an external commit log is a way for synchronizing data between distributed systems, or help rebuild the state in a failed system
Tip of the Week
Rémi Gallego is a music producer who makes music under a variety of names like The Algorithm and Boucle Infini, almost all of it is instrumental Synthwave with a hard-rock edge. They also make a lot of video game music, including 2 of my favorite game soundtracks of all time "The Last Spell" and "Hell is for Demons" (YouTube)
Did you know that the Kubernetes-focused TUI we've raved about before can be used to look up information about other things as well, like :helm and :events. Events is particularly useful for figuring out mysteries. You can see all the "resources" available to you with "?". You might be surprised at everything you see (pop-eye, x-ray, and monitoring)
WarpStream is an S3 backed, API compliant Kafka Alternative. Thanks MikeRg! (warpstream.com)
We finally start talking about Apache Kafka! Also, Allen is getting acquainted with Aesop, Outlaw is killing clusters, and Joe is paying attention in drama class.
Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com)
Intro to Apache Kafka
What is it?
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Core capabilities
High throughput - Deliver messages at network-limited throughput using a cluster of machines with latencies as low as 2ms.
Scalable - Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, and hundreds of thousands of partitions. Elastically expand and contract storage and processing
Permanent storage - Store streams of data safely in a distributed, durable, fault-tolerant cluster.
High availability - Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions.
Ecosystem
Built-in stream processing - Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing.
Connect to almost anything - Kafka’s out-of-the-box Connect interface integrates with hundreds of event sources and event sinks including Postgres, JMS, Elasticsearch, AWS S3, and more.
Client libraries - Read, write, and process streams of events in a vast array of programming languages
Large ecosystem of open source tools - Large ecosystem of open source tools: Leverage a vast array of community-driven tooling.
Trust and Ease of Use
Mission critical - Support mission-critical use cases with guaranteed ordering, zero message loss, and efficient exactly-once processing.
Trusted by thousands of organizations - Thousands of organizations use Kafka, from internet giants to car manufacturers to stock exchanges. More than 5 million unique lifetime downloads.
Vast user community - Kafka is one of the five most active projects of the Apache Software Foundation, with hundreds of meetups around the world.
What is it?
Getting data in real-time from event sources like databases, sensors, mobile devices, cloud services, applications, etc. in the form of streams of events. Those events are stored "durably" (in Kafka) for processing, either in real-time or retrospectively, and then routed to various destinations depending on your needs. It's this continuous flow and processing of data that is known as "streaming data" How can it be used? (some examples)
Processing payments and financial transactions in real-time
Tracking automobiles and shipments in real time for logistical purposes
Capture and analyze sensor data from IoT devices or other equipment
To connect and share data from different divisions in a company
Apache Kafka as an event streaming platform?
It contains three key capabilities that make it a complete streaming platform
Can publish and subscribe to streams of events
Can store streams of events durably and reliably for as long as necessary (infinitely if you have the storage)
To process streams of events in real-time or retrospectively
Can be deployed to bare metal, virtual machines or to containers on-prem or in the cloud
Can be run self-managed or via various cloud providers as a managed service
How does Kafka work?
A distributed system that's composed of servers and clients that communicate using a highly performant TCP protocol
Servers
Kafka runs as a cluster of one or more servers that can span multiple data centers or cloud regions
Brokers - these are a portion of the servers that are the storage layer
Kafka Connect - these are servers that constantly import and export data from existing systems in your infrastructure such as relational databases
Kafka clusters are highly scalable and fault-tolerant
Clients
Allows you to write distributed applications that allow to read, write and process streams of events in parallel that are fault-tolerant and scale
These clients are available in many programming languages - both the ones provided by the core platform as well as 3rd party clients
Concepts
Events
It's a record of something that happened - also called a "record" in the documentation
Has a key
Has a value
Has an event timestamp
Can have additional metadata
Producers and Consumers
Producers - these are the client applications that publish/write events to Kafka
Consumers - these are the client applications that read/subscribe to events from Kafka
Producers and consumers are completely decoupled from each other
Topics
Events are stored in topics
Topics are like folders on a file system - events would be the equivalent of files within that folder
Topics are mutli-producer and multi-subscriber
There can be zero, one or many producers or subscribers to a topic that write to or read from that topic respectively
Unlike many message queuing systems, these events can be read from as many times as necessary because they are not deleted after being consumed
Deleting of messages is handled on a per topic configuration that determines how long events are retained
Kafka's performance is not dependent on the amount of data nor the duration of time data is stored, so storing for longer periods is not a problem
Tip of the Week
Flipper Zero is a multi-functional interaction device mixed with a Tamagotchi. It has a variety of IO options built in, RFID, NFC, GPIO, Bluetooth, USB, and a variety of low-voltage pins like you'd see on an Arduino. Using the device upgrades the dolphin, encouraging you to try new things…and it's all open-source with a vibrant community behind it. (shop.flipperzero.one)
Kafka Tui?! Kaskade is a cool-looking Kafka TUI that has got to be better than using the scripts in the build folder that comes with Kafka. (github.com/sauljabin/kaskade)
Microstudio is a web-based integrated development environment for making simple games and it's open source! (microstudio.dev)
Bing Copilot has a number of useful prompts (bing.com)
Designer (photos)
Vacation Planner
Cooking assistant
Fitness trainer
Sharing metrics between projects in GCP, Azure, and maybe AWS???
Picture, if you will, a nondescript office space, where time seems to stand still as programmers gather around a water cooler. Here, in the twilight of the workday, they exchange eerie tales of programming glitches, security breaches, and asynchronous calls. Welcome to the Programming Zone, where reality blurs and (silent) keystrokes echo in the depths of the unknown. Also, Allen is ready to boom, Outlaw is not happy about these category choices, and Joe takes the easy (but not longest) road.
Silent Key Tester for mechanical keyboards, you can specify a wide variety of switches (thockking.com)
Joe's preferences:
Durock Shrimp Silent T1
Tactile Gazzew Boba U4 Silent
Liner Kailh Silent Brown
Linear Lichicx Lucy Silent
Linear WS Wuque Studio Gray Silent
Tactile WS Wuque Studio
White Silent - Linear
Tactile Kailh Silent Pink
Linear Cherry MX Silent Red
Tip of the Week
Feeling nostalgic for the original GameBoy or GameBoy Color? GBStudio is a one-stop shop for making games, it's open-source and fully featured. You can do the art, music, and programming all in one tool and it's thoughtfully laid out and well-documented. Bonus…you games will work in GameBoy emulators AND you can even produce your own working physical copies. (If you don't want the high-level tools you can go old skool with "GBDK" too) (gbstudio.dev)
If you're going to do something, why not script it? If you're going to script it, save it for next time!
Dave's Garage is a YouTube channel that does deep dives into Windows internals, cool electronics projects, and everything in between! (YouTube)
This time we are missing the "ocks", but we hope you enjoy this off...ice topic chat about personalizing our workspaces. Also, Joe had to put a quarter in the jar, and Outlaw needs a cookie.
There's a story for Outlaw about this print: https://www.johndyerbaizley.com/product/four-horsemen-full-color-ap
Tip of the Week
If you have a car, you should consider getting a Mirror Dash Cam. It's a front and rear camera system that replaces your rearview mirror with a touchscreen. Impress all your friends with your recording, zoom, night vision, parking assistance, GPS, and 24/7 recording and monitoring. (Amazon)
Be careful about exercising after you give blood, else you might end up needing it back! (redcrossblood.org )
We are mixing it up on you again, no Outlaw this week, but we can offer you some talk of exotic databases. Also, Joe pronounces everything correctly and Allen leaves you with a riddle.
Store multiple values to a particular record's attribute
Some RDBMS's can do this as well, BUT it's typically an exception to the rule when you'd store an array on an attribute
In a MultiValue DBMS - that's how you SHOULD do it
Part of the reason it's done this way is these database systems are not optimized for JOINS
Looked at the Adabas and UniData sites - the primary selling points seem to be rapid application development / ease of learning and getting up to speed as well as data modeling that closely mirrors your application data structures
Provides the ability to efficiently store, modify, and query spatial data - data that appears in a geometrical space (maps, polygons, etc)
Generally have custom data types for storing the spatial data
Indices that allow for quick retrieval of spatial data about other spatial data
Also allow for performing spatial-specific operations on data, such as computing distances, merging or intersecting objects or even calculating areas
Geospatial data is a subset of spatial data - they represent places / spatial data on the Earth's surface
Spatio-temporal data is another variation - spatial data combined with timestamps
PostGIS - basically a plugin for PostgreSQL that allows for storing of spatial data
Additionally supports raster data - data for things like weather and elevation
If you want to learn how to use it and understand the data and what's stored (postgis.net)
Spatial data types are: point, line, polygon, and more…basically shapes
Rather than using b-tree indexes for sorting data for fast retrieval, spatial indexes that are bounding boxes - rectangles that identify what is contained within them
Typically accomplished with R-Tree and Quadtree implementations
RedFin - a real estate competitor to realtor.com and others, uses PostgreSQL / PostGIS
Quite a bit of software that supports OpenGIS so may be a good place to start if you're interested in storing/querying spatial data
Event Stores
Popular: 178. EventStoreDB, 336. IBM DB2 Event Store, 338. NEventStore
Used for implementing the concept of Event Sourcing
Event Sourcing - an application/data store where the current state of an object is obtained by "replaying" all the events that got it to its current state
This contrasts with RDBMS's in that relational typically store the current state of an object - historical state CAN be stored, but that's an implementation detail that has to be implemented, such as temporal tables in SQL Server or "history tables"
Only support adding new events and querying the order of events
Not allowed to update or delete an event
For performance reasons, many Event Store databases support snapshots for holding materialized states at points in time
Features: guaranteed writes, concurrency model, granulated stream and stream APIs
Many client interfaces: .NET, Java, Go, Node, Rust, and Python
Runs on just about all OSes - Windows, Mac, Linux
Highly available - can run in a cluster
Optimistic concurrency checks that will return an error if a check fails
"Projections" allow you to generate new events based off "interesting" occurrences in your existing data
For example. You are looking for how many Twitter users said "happy" within 5 minutes of the word "foo coffee shop" and within 2 minutes of saying "London".
Highly performant - 15k writes and 50k reads per second
If your internet connection is good, but your cell phone service is bad then you might want to consider Ooma. Ooma sells devices that plug into your network or connect wireless and provide a phone number, and a phone jack so you can hook up an an old school home telephone. We've using it for about a week now with no problems and it's been a breeze to set up. The devices range from $99 to $129 and there's a monthly "premier" plan you can buy with nifty features like a secondary phone line, advanced call blocking, and call forwarding. (ooma.com)
Why use "git reset --hard" when you can "git stash -u" instead? Reset is destructive, but stashing keeps your changes just in case you need them. Because sometimes, your "sometimes" is now!
This episode we are talking about keeping the internet interesting and making cool things by looking at PagedOut and Itch.io. Also, Allen won't ever mark you down, Outlaw won't ever give you up, and Joe took a note to say something about Barbie here but he can't remember what it was.
If you subscribe to Audible, don't forget that they have a lot of "free" content available, such as dramatic space operas and the "Great Courses" For example. "How to Listen to and Understand Great Music" is similar to a "Music Appreciation Course" you might take at uni. The author works through history, talking about the evolution of music and culture. It's 36 hours, and that's just ONE of the music courses available to you for "free" (once you subscribe) (audible.com)
Visualize Git is an excellent tool for seeing what really happens when you run git commands (git-school.github.io)
It's easy to work with checkboxes in Markdown and Obsidian, it's just - [ ] Don't forget the dash or spaces!
Did you know there is a Visual Studio Code plugin for converting Markdown to Jira markup syntax? (Code)
Apple, Google, and the major password manager vendors have ways to set up emergency contacts. It's very important that you have this setup for yourself, and your loved ones. When you need it, you really need it. (google.com)