Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn't stand a chance, Outlaw is in love, and Joe forgets his radio voice.
It’s time we learn about multi-object transactions as we continue our journey into Designing Data-Intensive Applications, while Allen didn’t specifically have that thought, Joe took a marketing class, and Michael promised he wouldn’t cry.
Multi-object transactions need to know which reads and writes are part of the same transaction.
In an RDBMS, this is typically handled by a unique transaction identifier managed by a transaction manager.
All statements between the BEGIN TRANSACTIONandCOMMIT TRANSACTION are part of that transaction.
Many non-relational databases don’t have a way of grouping those statements together.
Single object transactions must also be atomic and isolated.
Reading values while in the process of writing updated values would yield really weird results.
It’s for this reason that nearly all databases must support single object atomicity and isolation.
Atomicity is achievable with a log for crash recovery.
Isolation is achieved by locking the object to be written.
Some databases use a more complex atomic setup, such as an incrementer, eliminating the need for a read, modify, write cycle.
Another operation used is a compare and set.
These types of operations are useful for ensuring good writes when multiple clients are attempting to write the same object concurrently.
Transactions are more typically known for grouping multiple object writes into a single operational unit
Need for multi object transactions
Many distributed databases / datastores don’t have transactions because they are difficult to implement across partitions.
This can also cause problems for high performance or availability needs.
But there is no technical reason distributed transactions are not possible.
The author poses the question in the book: “Do we even need transactions?”
The short answer is, yes sometimes, such as:
Relational database systems where rows in tables link to rows in other tables,
In non-relational systems when data is denormalized for “object” reasons, those records need to be updated in a single shot, or
Indexes against tables in relational databases need to be updated at the same time as the underlying records in the tables.
These can be handled without database transactions, but error handling on the application side becomes much more difficult.
Lack of isolation can cause concurrency problems.
Handling errors and aborts
ACID transactions that fail are easily retry-able.
Some systems with leaderless replication follow the “best effort” basis. The database will do what it can, and if something fails in the middle, it’ll leave anything that was written, meaning it won’t undo anything it already finished.
This puts all the burden on the application to recover from an error or failure.
The book calls out developers saying that we only like to think about the happy path and not worry about what happens when something goes wrong.
The author also mentioned there are a number of ORM’s that don’t do transactions proud and rather than building in some retry functionality, if something goes wrong, it’ll just bubble an error up the stack, specifically calling out Rails ActiveRecord and Django.
Even ACID transactions aren’t necessarily perfect.
What if a transaction actually succeeded but the notification to the client got interrupted and now the application thinks it needs to try again, and MIGHT actually write a duplicate?
If an error is due to “overload”, basically a condition that will continue to error constantly, this could cause an unnecessary load of retries against the database.
Retrying may be pointless if there are network errors occurring.
Retrying something that will always yield an error is also pointless, such as a constraint violation.
There may be situations where your transactions trigger other actions, such as emails, SMS messages, etc. and in those situations you wouldn’t want to send new notifications every time you retry a transaction as it might generate a lot of noise.
When dealing with multiple systems such as the previous example, you may want to use something called a two-phase commit.
Tip of the Week
Manything is an app that lets you use your old devices as security cameras. You install the app on your old phone or tablet, hit record, and configure motion detection. A much easier and cheaper option than ordering a camera! (apps.apple.com, play.google.com)
The Linux Foundation offers training and certifications. Many great training courses, some free, some paid. There’s a nice Introduction to Kubernetes course you can try, and any money you do spend is going to a good place! (training.linuxfoundation.org)
Kubernetes has recommendations for common-labels. The labels are helpful and standardization makes it easier to write tooling and queries around them. (kubernetes.io)
Markdown Presentation for Visual Studio Code, thanks for the tip Nathan V! Marp lets you create slideshows from markdown in Visual Studio Code and helps you separate your content from the format. It looks great and it’s easy to version and re-use the data! (marketplace.visualstudio.com)