In this episode, we're talking about lessons learned and the lessons we still need to learn. Also, Michael shares some anti-monetization strategies, Allen wins by default, and Joe keeps it real 59/60 days a year!
Unit Testing Principles, Practices, and Patterns: Effective testing styles, patterns, and reliable automation for unit testing, mocking, and integration testing with examples in C# (Amazon)
In this sequence of sound, we compute Joe's unexpected pleasure in commercial-viewing algorithms, Michael's intricate process of slicing up the pizza, and Allen's persistent request for more cheese data augmentation. Will you engage in this data streaming session?
MusicLM lets you create music from descriptive text, similar to Dalle-2. The output is a little strange, but could still potentially be really useful and inspiring with a little bit of effort. It's in private beta now, as part of the "AI Test Kitchen" but you can sign up to join the waitlist today.
In this episode we talk about several things that have been on our mind. We find that Joe has been taken over by AI's, Michael now understands our love of Kotlin, and Allen wants to know how to escape supporting code you wrote forever.
We're doing a water cooler talk today. Also, Allen can tell you how not to leak secrets, Michael knows how to work a spreadsheet, and Joe has been replaced by an AGI.
Have any experience with Twilio? It's work! (twilio.com)
Resources we like
docker init is a tool (in beta) built into the latest Docker Desktop that you can use to get a leg up on your next project. It makes it easy to create docker files with best practices, as well as a docker-compose file to get you up and running. (docker.com)
screen is an open-source powerful terminal multiplexer that allows users to create, manage, and switch between multiple terminal sessions, enabling seamless multitasking and persistent remote connections in a single window.
The VIVO Universal Treadmill Desk Riser is an adjustable, ergonomic workspace solution designed to fit most treadmills, allowing users to seamlessly combine their work and exercise routines for a healthy, productive lifestyle. (amazon.com)
The LifeSpan Fitness Under Desk Walking Treadmill is a compact, low-profile treadmill designed to fit under standing desks, enabling remote workers to maintain an active lifestyle by seamlessly integrating walking or light jogging into their daily work routine, promoting better health and increased productivity. (amazon.com)
Kubernetes Network Policies are a set of rules that define how pods within a cluster can communicate with each other and with external resources, allowing administrators to enforce fine-grained access control and enhance the security of their containerized applications. (kubernetes.io)
What are lost updates, and what can we do about them? Maybe we don't do anything and accept the write skew? Also, Allen has sharp ears, Outlaw's gort blah spotterfiles, and Joe is just thinking about breakfast.
Last episode we talked about weak isolation, committed reads, and snapshot isolation
There is one major problem we didn't discuss called "The Lost Update Problem"
Consider a read-modify-write transaction, now imagine two of them happening at the same time
Even with snapshot isolation, it's possible that read can happen for transaction A before B, but the write for A happens first
Incrementing/Decrementing values (counters, bank accounts)
Updating complex values (JSON for example)
CMS updates that send the full page as an update
Solutions:
Atomic Writes - Some databases support atomic updates that effectively combine the read and write
Cursor Stability - locking the read object until the update is performed
Single Threading - Force all atomic operations to happen serially through a single thread
Explicit Locking
The application can be responsible for explicitly locking objects, placing responsibility in the devs hands
This makes sense in certain situations - imagine a multiplayer game where multiple players can move a shared object. It's not enough to lock the data and then apply both updates in order since the shared game world can react. (ie: showing that the item is in use)
Detecting Lost Updates
Locks can be tricky, what if we reused the snapshot mechanism we discussed before?
We're already keeping a record of the last transactionId to modify our data, and we know our current transactionId. What if we just failed any updates where our current transaction id was less than the transactionId of the last write to our data?
This allows for naive application code, but also gives you fewer options…retry or give up
Note: MySQL's InnoDB's Repeatable Read feature does not support this, so some argue it doesn't qualify as snapshot isolation
What if you didn't have transactions?
If you didn't have transactions, let alone a snapshot number, you could get similar behavior by doing a compare-and-set
Example: update account set balance = 10 where balance = 9 and id = ABC
This works best in simple databases that support atomic updates, but not great with snapshot isolation
Note: it's up to the application code to check that updates were successful - Updating 0 records is not an error
Conflict resolution and replication
We haven't talked much about replicas lately, how do we handle lost updates when we have multiple copies of data on multiple nodes?
Compare-and-Set strategies and locking strategies assume a single up-to-date copy of the data….uh oh
The options are limited here, so the strategy is to accept the writes and have an application process to decide what to do
Merge: Some operations, like incrementing a counter, can be safely merged. Riak has special datatypes for these
Last Write Wins: This is a common solution. It's simple but inaccurate. Also the most common solution.
Write Skew and Phantoms
Write skew - when a race condition occurs that allows writes to different records to take place at the same time that violates a state constraint
The example given in the book is the on-call doctor rotation
If one record had been modified after another record's transaction had been completed, the race condition would not have taken place
write-skew is a generalization of the lost update problem
Preventing write-skew
Atomic single-object locks won't work because there's more than one object being updated
Snapshot isolation also doesn't work in many implementations - SQL Server, PostgreSQL, Oracle, and MySQL won't prevent write skew
Requires true serializable isolation
Most databases don't allow you to create constraints on multiple objects but you may be able to work around this using triggers or materialized views as your constraint
They mention if you can't use serializable isolation, your next best option may be to lock the rows for an update in a transaction meaning nothing else can access them while the transaction is open
Phantoms causing write skew
Pattern
The query for some business requirement - ie there's more than one doctor on call
The application decides what to do with the results from the query
If the application decides to go forward with the change, then an INSERT, UPDATE, or DELETE operation will occur that would change the outcome of the previous step's Application decision
They mention the steps could occur in different orders, for instance, you could do the write operation first and then check to make sure it didn't violate the business constraint
In the case of checking for records that meet some condition, you could do a SELECT FOR UPDATE and lock those rows
In the case that you're querying for a condition by checking on records to exist, if they don't exist there's nothing to lock, so the SELECT FOR UPDATE won't work and you get a phantom write - a write in one transaction changes the search result of a query in another transaction
Snapshot isolation avoids phantoms in read-only queries, but can't stop them in read-write transactions
Materializing conflicts
The problem we mentioned with phantom is there'd no record/object to lock because it doesn't exist
What if you were to have a set of records that could be used for locking to alleviate the phantom writes?
Create records for every possible combination of conflicting events and only use those to lock when doing a write
"materializing conflicts" because you're taking the phantom writes and turning them into lock records that will prevent those conflicts
This can be difficult and prone to errors trying to create all the combinations of locks AND this is a nasty leakage of your storage into your application
Docker's Buildkit is their backend builder that replaces the "legacy" builder by adding new non-backward compatible functionality. The way you enable buildkit is a little awkward, either passing flags or setting variables as well as enabling the features per Dockerfile, but it's worth it! One of the cool features is the "mount" flag that you can pass as part of a RUN statement to bring in files that are not persisted past that layer. This is great for efficiency and security. The "cache" type is great for utilizing Docker's cache to save time in future builds. The "bind" type is nice for mounting files you only need temporarily. like source code in for a compiled language. The "secret" is great for temporarily bringing in environment variables without persisting them. Type "ssh" is similar to "secret", but for sharing ssh keys. Finally "tmpfs" is similar to swap memory, using an in-memory file system that's nice for temporarily storing data in primary memory as a file that doesn't need to be persisted. (github.com)
Did you know Google has a Google Cloud Architecture diagramming tool? It's free and easy to use so give it a shot! (cloud.google.com)
ChatGTP has an app for slack. It's designed to deliver instant conversation summaries, research tools, and writing assistance. Is this the end of scrolling through hundreds of messages to catch up on whatever is happening? /chatgpt summarize (salesforce.com)
Have you heard about ephemeral containers? It's a convenient way to spin up temporary containers that let you inspect files in a pod and do other debugging activities. Great for, well, debugging! (kubernetes.io)
There's this thing called ChatGPT you may have heard of. Is it the end for all software developers? Have we reached the epitome of mankind? Also, should you write your own or find a FOSS solution? That and much more as Allen gets redemption, Joe has a beautiful monologue, and Outlaw debates a monitor that is a thumb size larger than his current setup.
This probably isn't the first time and it won't be the last we ask the question - should you write your own version of something if there's a good Free Open Source Software alternative out there?
Typed vs Untyped Languages
Another topic that we've touched on over the years - which is better and why?
Any considerations when working with teams of developers?
What are the pros and cons of each?
Cloud Pricing
If you're spending a good amount of money in the cloud, you should probably talk to a sales rep for your given cloud and try to negotiate rates. You may be surprised how much you can save. And...you never know until you ask!
Outlaw has the Itch to get a new Monitor
Is it worth upgrading from a 34" ultrawide to a 38" ultrawide?
Did you know that the handy, dandy application jq is great for formatting json AND it's also Turing complete? You can do full on programming inside jq to make changes - conditionals, variables, math, filtering, mapping...it's Turing Complete! https://stedolan.github.io/jq/
Want to freshen up your space, but you just don't have the vision? Give interiorai.com a chance, upload a picture of your room and give it a description. It works better than it should.
You can sort your command line output when doing something like an ls sort -k2 -b
On macOS you can drag a non-fullscreen window to a fullscreen desktop
When using the ls -l command in a terminal, that first numeric column shows the number of hard links to a file - meaning the number of names an inode has for that file
Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn't stand a chance, Outlaw is in love, and Joe forgets his radio voice.
It’s time we learn about multi-object transactions as we continue our journey into Designing Data-Intensive Applications, while Allen didn’t specifically have that thought, Joe took a marketing class, and Michael promised he wouldn’t cry.
Multi-object transactions need to know which reads and writes are part of the same transaction.
In an RDBMS, this is typically handled by a unique transaction identifier managed by a transaction manager.
All statements between the BEGIN TRANSACTIONandCOMMIT TRANSACTION are part of that transaction.
Many non-relational databases don’t have a way of grouping those statements together.
Single object transactions must also be atomic and isolated.
Reading values while in the process of writing updated values would yield really weird results.
It’s for this reason that nearly all databases must support single object atomicity and isolation.
Atomicity is achievable with a log for crash recovery.
Isolation is achieved by locking the object to be written.
Some databases use a more complex atomic setup, such as an incrementer, eliminating the need for a read, modify, write cycle.
Another operation used is a compare and set.
These types of operations are useful for ensuring good writes when multiple clients are attempting to write the same object concurrently.
Transactions are more typically known for grouping multiple object writes into a single operational unit
Need for multi object transactions
Many distributed databases / datastores don’t have transactions because they are difficult to implement across partitions.
This can also cause problems for high performance or availability needs.
But there is no technical reason distributed transactions are not possible.
The author poses the question in the book: “Do we even need transactions?”
The short answer is, yes sometimes, such as:
Relational database systems where rows in tables link to rows in other tables,
In non-relational systems when data is denormalized for “object” reasons, those records need to be updated in a single shot, or
Indexes against tables in relational databases need to be updated at the same time as the underlying records in the tables.
These can be handled without database transactions, but error handling on the application side becomes much more difficult.
Lack of isolation can cause concurrency problems.
Handling errors and aborts
ACID transactions that fail are easily retry-able.
Some systems with leaderless replication follow the “best effort” basis. The database will do what it can, and if something fails in the middle, it’ll leave anything that was written, meaning it won’t undo anything it already finished.
This puts all the burden on the application to recover from an error or failure.
The book calls out developers saying that we only like to think about the happy path and not worry about what happens when something goes wrong.
The author also mentioned there are a number of ORM’s that don’t do transactions proud and rather than building in some retry functionality, if something goes wrong, it’ll just bubble an error up the stack, specifically calling out Rails ActiveRecord and Django.
Even ACID transactions aren’t necessarily perfect.
What if a transaction actually succeeded but the notification to the client got interrupted and now the application thinks it needs to try again, and MIGHT actually write a duplicate?
If an error is due to “overload”, basically a condition that will continue to error constantly, this could cause an unnecessary load of retries against the database.
Retrying may be pointless if there are network errors occurring.
Retrying something that will always yield an error is also pointless, such as a constraint violation.
There may be situations where your transactions trigger other actions, such as emails, SMS messages, etc. and in those situations you wouldn’t want to send new notifications every time you retry a transaction as it might generate a lot of noise.
When dealing with multiple systems such as the previous example, you may want to use something called a two-phase commit.
Tip of the Week
Manything is an app that lets you use your old devices as security cameras. You install the app on your old phone or tablet, hit record, and configure motion detection. A much easier and cheaper option than ordering a camera! (apps.apple.com, play.google.com)
The Linux Foundation offers training and certifications. Many great training courses, some free, some paid. There’s a nice Introduction to Kubernetes course you can try, and any money you do spend is going to a good place! (training.linuxfoundation.org)
Kubernetes has recommendations for common-labels. The labels are helpful and standardization makes it easier to write tooling and queries around them. (kubernetes.io)
Markdown Presentation for Visual Studio Code, thanks for the tip Nathan V! Marp lets you create slideshows from markdown in Visual Studio Code and helps you separate your content from the format. It looks great and it’s easy to version and re-use the data! (marketplace.visualstudio.com)