Coding Blocks

What are lost updates, and what can we do about them? Maybe we don't do anything and accept the write skew? Also, Allen has sharp ears, Outlaw's gort blah spotterfiles, and Joe is just thinking about breakfast.

The full show notes for this episode are available at https://www.codingblocks.net/episode206.

News

  • Thank you for the amazing reviews!
    • iTunes: JomilyAnv
  • Want to help us out? Leave us a review.

Designing Data Intensive Applications
Great book!

Preventing Lost Updates

  • Last episode we talked about weak isolation, committed reads, and snapshot isolation
  • There is one major problem we didn't discuss called "The Lost Update Problem"
  • Consider a read-modify-write transaction, now imagine two of them happening at the same time
  • Even with snapshot isolation, it's possible that read can happen for transaction A before B, but the write for A happens first
    • Incrementing/Decrementing values (counters, bank accounts)
    • Updating complex values (JSON for example)
    • CMS updates that send the full page as an update
  • Solutions:
    • Atomic Writes - Some databases support atomic updates that effectively combine the read and write
      • Cursor Stability - locking the read object until the update is performed
      • Single Threading - Force all atomic operations to happen serially through a single thread
    • Explicit Locking
      • The application can be responsible for explicitly locking objects, placing responsibility in the devs hands
      • This makes sense in certain situations - imagine a multiplayer game where multiple players can move a shared object. It's not enough to lock the data and then apply both updates in order since the shared game world can react. (ie: showing that the item is in use)

Detecting Lost Updates

  • Locks can be tricky, what if we reused the snapshot mechanism we discussed before?
  • We're already keeping a record of the last transactionId to modify our data, and we know our current transactionId. What if we just failed any updates where our current transaction id was less than the transactionId of the last write to our data?
  • This allows for naive application code, but also gives you fewer options…retry or give up
  • Note: MySQL's InnoDB's Repeatable Read feature does not support this, so some argue it doesn't qualify as snapshot isolation

What if you didn't have transactions?

  • If you didn't have transactions, let alone a snapshot number, you could get similar behavior by doing a compare-and-set
  • Example: update account set balance = 10 where balance = 9 and id = ABC
  • This works best in simple databases that support atomic updates, but not great with snapshot isolation
  • Note: it's up to the application code to check that updates were successful - Updating 0 records is not an error

Conflict resolution and replication

  • We haven't talked much about replicas lately, how do we handle lost updates when we have multiple copies of data on multiple nodes?
  • Compare-and-Set strategies and locking strategies assume a single up-to-date copy of the data….uh oh
  • The options are limited here, so the strategy is to accept the writes and have an application process to decide what to do
    • Merge: Some operations, like incrementing a counter, can be safely merged. Riak has special datatypes for these
    • Last Write Wins: This is a common solution. It's simple but inaccurate. Also the most common solution.

Write Skew and Phantoms

  • Write skew - when a race condition occurs that allows writes to different records to take place at the same time that violates a state constraint
    • The example given in the book is the on-call doctor rotation
    • If one record had been modified after another record's transaction had been completed, the race condition would not have taken place
    • write-skew is a generalization of the lost update problem
  • Preventing write-skew
    • Atomic single-object locks won't work because there's more than one object being updated
    • Snapshot isolation also doesn't work in many implementations - SQL Server, PostgreSQL, Oracle, and MySQL won't prevent write skew
      • Requires true serializable isolation
    • Most databases don't allow you to create constraints on multiple objects but you may be able to work around this using triggers or materialized views as your constraint
    • They mention if you can't use serializable isolation, your next best option may be to lock the rows for an update in a transaction meaning nothing else can access them while the transaction is open
  • Phantoms causing write skew
    • Pattern
      • The query for some business requirement - ie there's more than one doctor on call
      • The application decides what to do with the results from the query
      • If the application decides to go forward with the change, then an INSERT, UPDATE, or DELETE operation will occur that would change the outcome of the previous step's Application decision
        • They mention the steps could occur in different orders, for instance, you could do the write operation first and then check to make sure it didn't violate the business constraint
      • In the case of checking for records that meet some condition, you could do a SELECT FOR UPDATE and lock those rows
      • In the case that you're querying for a condition by checking on records to exist, if they don't exist there's nothing to lock, so the SELECT FOR UPDATE won't work and you get a phantom write - a write in one transaction changes the search result of a query in another transaction
  • Snapshot isolation avoids phantoms in read-only queries, but can't stop them in read-write transactions

Materializing conflicts

  • The problem we mentioned with phantom is there'd no record/object to lock because it doesn't exist
  • What if you were to have a set of records that could be used for locking to alleviate the phantom writes?
    • Create records for every possible combination of conflicting events and only use those to lock when doing a write
      • "materializing conflicts" because you're taking the phantom writes and turning them into lock records that will prevent those conflicts
        • This can be difficult and prone to errors trying to create all the combinations of locks AND this is a nasty leakage of your storage into your application
          • Should be a last resort

Resources We Like

Tip of the Week

  • Docker's Buildkit is their backend builder that replaces the "legacy" builder by adding new non-backward compatible functionality. The way you enable buildkit is a little awkward, either passing flags or setting variables as well as enabling the features per Dockerfile, but it's worth it! One of the cool features is the "mount" flag that you can pass as part of a RUN statement to bring in files that are not persisted past that layer. This is great for efficiency and security. The "cache" type is great for utilizing Docker's cache to save time in future builds. The "bind" type is nice for mounting files you only need temporarily. like source code in for a compiled language. The "secret" is great for temporarily bringing in environment variables without persisting them. Type "ssh" is similar to "secret", but for sharing ssh keys. Finally "tmpfs" is similar to swap memory, using an in-memory file system that's nice for temporarily storing data in primary memory as a file that doesn't need to be persisted. (github.com)
  • Did you know Google has a Google Cloud Architecture diagramming tool? It's free and easy to use so give it a shot! (cloud.google.com)
  • ChatGTP has an app for slack. It's designed to deliver instant conversation summaries, research tools, and writing assistance. Is this the end of scrolling through hundreds of messages to catch up on whatever is happening? /chatgpt summarize (salesforce.com)
  • Have you heard about ephemeral containers? It's a convenient way to spin up temporary containers that let you inspect files in a pod and do other debugging activities. Great for, well, debugging! (kubernetes.io)

Direct download: coding-blocks-episode-206.mp3
Category:Software Development -- posted at: 7:55pm EDT

There's this thing called ChatGPT you may have heard of. Is it the end for all software developers? Have we reached the epitome of mankind? Also, should you write your own or find a FOSS solution? That and much more as Allen gets redemption, Joe has a beautiful monologue, and Outlaw debates a monitor that is a thumb size larger than his current setup.

If you're in a podcast player and would prefer to read it on the web, follow this link:
https://www.codingblocks.net/episode205

News

  • Thank you for the amazing reviews!
    • iTunes: MalTheWarlock, Abdullah Nafees, BarnabusNutslap
  • Orlando Code Camp coming up Saturday March 25th

ChatGPT

  • Is this the beginning or the end of software development as we know it?
  • Are you using it for work? Does your work have an AI policy?
  • OpenAI has recently announced a whopping 90% price reduction on their ChatGPT and Whisper APi calls
    • $.002 per 1000 ChatGPT tokens
    • $.006 per minute to Whisper
  • You also get $5 in free credit in your first 3 months, so give it a shot!
  • https://openai.com/pricing

Roll Your Own vs FOSS

  • This probably isn't the first time and it won't be the last we ask the question - should you write your own version of something if there's a good Free Open Source Software alternative out there?

Typed vs Untyped Languages

  • Another topic that we've touched on over the years - which is better and why?
  • Any considerations when working with teams of developers?
  • What are the pros and cons of each?

Cloud Pricing

  • If you're spending a good amount of money in the cloud, you should probably talk to a sales rep for your given cloud and try to negotiate rates. You may be surprised how much you can save. And...you never know until you ask!

Outlaw has the Itch to get a new Monitor

Resources from this episode

Tips of the Week

  • Did you know that the handy, dandy application jq is great for formatting json AND it's also Turing complete? You can do full on programming inside jq to make changes - conditionals, variables, math, filtering, mapping...it's Turing Complete!
    https://stedolan.github.io/jq/
  • Want to freshen up your space, but you just don't have the vision? Give interiorai.com a chance, upload a picture of your room and give it a description. It works better than it should.
  • You can sort your command line output when doing something like an ls
    sort -k2 -b
  • On macOS you can drag a non-fullscreen window to a fullscreen desktop
  • When using the ls -l command in a terminal, that first numeric column shows the number of hard links to a file - meaning the number of names an inode has for that file
  • Argument parser for Python 3 - makes parsing command line arguments a breeze and creates beautiful --help documentation to boot!
    https://docs.python.org/3/library/argparse.html
  • .NET has an equivalent parser we've mentioned in the past
    https://www.nuget.org/packages/NuGet.CommandLine

Direct download: coding-blocks-episode-205.mp3
Category:Software Development -- posted at: 12:58am EDT

Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn't stand a chance, Outlaw is in love, and Joe forgets his radio voice.

The full show notes for this episode are available at https://www.codingblocks.net/episode204.

Direct download: coding-blocks-episode-204.mp3
Category:Software Development -- posted at: 8:00pm EDT

It’s time we learn about multi-object transactions as we continue our journey into Designing Data-Intensive Applications, while Allen didn’t specifically have that thought, Joe took a marketing class, and Michael promised he wouldn’t cry.

The full show notes for this episode are available at https://www.codingblocks.net/episode203.

News

  • Thanks for the reviews!
    • iTunes: Dom Bell 30, Tontonton2
  • Want some swag? We got swag! (/swag)
  • Orlando Codecamp 2023 is coming up in March 25th 2023 (orlandocodecamp.com)

Single Object and Multi-Object Operations

Designing Data Intensive Applications
Best book evarr!
  • Multi-object transactions need to know which reads and writes are part of the same transaction.
    • In an RDBMS, this is typically handled by a unique transaction identifier managed by a transaction manager.
    • All statements between the BEGIN TRANSACTION and COMMIT TRANSACTION are part of that transaction.
  • Many non-relational databases don’t have a way of grouping those statements together.
  • Single object transactions must also be atomic and isolated.
  • Reading values while in the process of writing updated values would yield really weird results.
    • It’s for this reason that nearly all databases must support single object atomicity and isolation.
    • Atomicity is achievable with a log for crash recovery.
    • Isolation is achieved by locking the object to be written.
  • Some databases use a more complex atomic setup, such as an incrementer, eliminating the need for a read, modify, write cycle.
  • Another operation used is a compare and set.
  • These types of operations are useful for ensuring good writes when multiple clients are attempting to write the same object concurrently.
  • Transactions are more typically known for grouping multiple object writes into a single operational unit

Need for multi object transactions

  • Many distributed databases / datastores don’t have transactions because they are difficult to implement across partitions.
    • This can also cause problems for high performance or availability needs.
    • But there is no technical reason distributed transactions are not possible.
  • The author poses the question in the book: “Do we even need transactions?”
    • The short answer is, yes sometimes, such as:
      • Relational database systems where rows in tables link to rows in other tables,
      • In non-relational systems when data is denormalized for “object” reasons, those records need to be updated in a single shot, or
      • Indexes against tables in relational databases need to be updated at the same time as the underlying records in the tables.
  • These can be handled without database transactions, but error handling on the application side becomes much more difficult.
    • Lack of isolation can cause concurrency problems.

Handling errors and aborts

  • ACID transactions that fail are easily retry-able.
  • Some systems with leaderless replication follow the “best effort” basis. The database will do what it can, and if something fails in the middle, it’ll leave anything that was written, meaning it won’t undo anything it already finished.
    • This puts all the burden on the application to recover from an error or failure.
  • The book calls out developers saying that we only like to think about the happy path and not worry about what happens when something goes wrong.
  • The author also mentioned there are a number of ORM’s that don’t do transactions proud and rather than building in some retry functionality, if something goes wrong, it’ll just bubble an error up the stack, specifically calling out Rails ActiveRecord and Django.
  • Even ACID transactions aren’t necessarily perfect.
    • What if a transaction actually succeeded but the notification to the client got interrupted and now the application thinks it needs to try again, and MIGHT actually write a duplicate?
    • If an error is due to “overload”, basically a condition that will continue to error constantly, this could cause an unnecessary load of retries against the database.
    • Retrying may be pointless if there are network errors occurring.
    • Retrying something that will always yield an error is also pointless, such as a constraint violation.
    • There may be situations where your transactions trigger other actions, such as emails, SMS messages, etc. and in those situations you wouldn’t want to send new notifications every time you retry a transaction as it might generate a lot of noise.
      • When dealing with multiple systems such as the previous example, you may want to use something called a two-phase commit.

Tip of the Week

  • Manything is an app that lets you use your old devices as security cameras. You install the app on your old phone or tablet, hit record, and configure motion detection. A much easier and cheaper option than ordering a camera! (apps.apple.complay.google.com)
  • The Linux Foundation offers training and certifications. Many great training courses, some free, some paid. There’s a nice Introduction to Kubernetes course you can try, and any money you do spend is going to a good place! (training.linuxfoundation.org)
  • Kubernetes has recommendations for common-labels. The labels are helpful and standardization makes it easier to write tooling and queries around them. (kubernetes.io)
  • Markdown Presentation for Visual Studio Code, thanks for the tip Nathan V! Marp lets you create slideshows from markdown in Visual Studio Code and helps you separate your content from the format. It looks great and it’s easy to version and re-use the data! (marketplace.visualstudio.com)
Direct download: coding-blocks-episode-203.mp3
Category:Software Development -- posted at: 8:46pm EDT

We decided to knock the dust off our copies of Designing Data-Intensive Applications to learn about transactions while Michael is full of solutions, Allen isn’t deterred by Cheater McCheaterton, and Joe realizes wurds iz hard.

The full show notes for this episode are available at https://www.codingblocks.net/episode202.

News

  • Thanks for the reviews!
    • iTunes: Jla115, Cuttin’ Corner Barbershop, mirgeee, JackUnver
    • Audible: Mr. William M. Davies
  • Want some swag? We got swag! (/swag)
Designing Data Intensive Applications
It’s baaaaack!

Chapter 7: Transactions

  • Great statement from one of the creators of Google’s Spanner where the general idea is that it’s better to have transactions as an available feature even if it has performance issues and let developers decide if the performance is worth the tradeoff, rather than not having transactions and putting all that complexity on the developer.
  • Number of things that can go wrong during database interactions:
    • DB software or underlying hardware could fail during a write,
    • An application that uses the DB might crash in the middle of a series of operations,
    • Network problems could arise,
    • Multiple writes to the same records from multiple places causing race conditions,
    • Reads could happen to partially updated data which may not make sense, and/or
    • Race conditions between clients could cause weird problems.
  • “Reliable” systems can handle those situations and ensure they don’t cause catastrophic failures, but making a system “reliable” is a lot of work.
  • Transactions are what have been used for decades to address those issues.
    • A transaction is a way to group all related reads and writes into a single operation.
    • Either a transaction as a whole completes successfully as a “commit” or fails as an “abort, rollback”.
      • If the transaction fails, the application can choose what to do, like retry for example.
  • In general, transactions make error handling much simpler for an application.
    • That was their purpose, to make developing against a database much simpler.
  • Not all applications need transactions.
  • In some cases, it makes sense not to use transactions for performance and/or availability reasons.

How do you know if you need a transaction?

  • What are the safety guarantees?
  • What are the costs of using them?

Concepts of a transaction

  • Most relational DBs support transactions and some non-relational DBs support transactions.
  • The general idea of a transaction has been around mostly unchanged for over 40 years, originally introduced in IBM System R, the first relational database.
  • With the introduction of a lot of the NoSQL (non-relational) databases, transactions were left out.
    • In some NoSQL implementations, they redefined what a transaction meant with a weaker set of guarantees.
      • A popular belief was put out there that transactions meant anti-scalable.
      • Another popular belief was that to have a “serious” database, it had to have transactions.
        • The book calls out both as hyperbole.
        • The reality is there are tradeoffs for both having or not having transactions.
  • ACID is the acronym to describe the safety guarantees of databases and stands for Atomicity, Consistency, Isolation, and Durability.
    • Coined in 1983 by Theo Harder and Andreas Reuter.
    • The reality is that each database’s implementation of ACID may be very different.
      • Lots of ambiguity for what Isolation means.
      • Because ACID doesn’t specify the actual guarantees, it’s basically a marketing term.
  • Systems that don’t support ACID are often referred to as BASE, BAsically available, Soft state, and Eventual consistency.
    • Even more vague than ACID! BASE, more or less, just means anything but ACID.

Atomicity

  • Atomicity refers to something that can not be broken into smaller parts.
    • In terms of multi-threaded programming, this means you can only see the state of something before or after a complete operation and nothing in-between.
    • In the world of database and ACID, atomicity has nothing to do with concurrency. For instance, if multiple actions are trying to processes the same data, that’s covered under Isolation.
      • Instead, ACID describes what should happen if there is a fault while performing multiple related writes.
        • For example, if a group of related writes are to be performed in an operation and there is some underlying error that occurs before the transaction of writes can be committed, then the operation is aborted and any writes that occurred during that operation must be undone, i.e. rolled back.
  • Without atomicity, it is difficult to know what part of the operation completed and what failed.
  • The benefit of the rollback is you don’t have to have any special logic in your application to figure out how to get back to the original state. You can just simply try again because the transaction took care of the cleanup for you.
    • This ability to get rid of any writes after an abort is basically what the atomicity is all about.

Consistency

  • In ACID, consistency just means the database is in a good state.
  • But consistency is a property of the application as it’s what defines the invariants for its operations.
    • This means that you must write your application transactions properly to satisfy the invariants that have been defined.
    • The database can take care of certain invariants, such as foreign key constraints and uniqueness constraints, but otherwise it’s left up to the application to set up the transactions properly.
    • The book suggests that because the consistency is on the application’s shoulders, the C shouldn’t be part of ACID.

Isolation

  • Isolation is all about handling concurrency problems and race conditions.
    • The author provided an example of two clients trying to increment a single database counter concurrently, the value should have gone from 3 to 5, but only went to 4 because there was a race condition.
  • Isolation means that the transactions are isolated from each other so the previous example cannot happen.
    • The book doesn’t dive deep on various forms of isolation implementations here as they go deeper in later sections, however one that was brought up was treating every transaction as if it was a serial transaction. The problem with this is there is a rather severe performance hit for forcing everything serially.
      • The section that describes the additional isolation levels is “Weak Isolation Levels”.

Durability

  • Durability just means that once the database has committed a write, the data will not be forgotten, even if a database failure or hardware failure occurs.
    • This notion of durability typically means, in a single node database, that the data has been written to the drive, typically to a write-ahead log or similar implementation.
      • The write-ahead log ensures if there is any data corruption in the database, that it can be rebuilt, if necessary.
  • In a replicated database, durability means that the data has been written to the other nodes successfully.
    • The performance implication here is that for the database to guarantee that it’s durable, it must wait for those distributed writes to complete before committing the transaction.
  • PERFECT DURABILITY DOES NOT EXIST.
    • If all your databases and backups somehow got destroyed at the same time, there’s absolutely nothing you could do.

Resources we Like

  • Coding Blocks Jam ’23 (itch.io)
  • NewSQL (Wikipedia)
  • Visual Studio (Wikipedia)
  • Chrissy’s Court (IMDb)
  • Tracy Morgan gets in a crash right after buying a $2 million Bugatti (CNN)
  • IBM System R (Wikipedia)
  • Database Schema for Multiple Types of Products (Coding Blocks)
  • Uber’s Big Data Platform: 100+ Petabytes with Minute Latency (Uber)
  • How to store data for 1,000 years (BBC)
  • Longevity of Recordable CDs, DVDs and Blu-rays – Canadian Conservation Institute (CCI) Notes 19/1 (canada.ca)

Tip of the Week

  • The Bad Plus is an instrumental band that makes amazing music that’s perfect for programming. It’s a little wild, and a little strange. Maybe like Radiohead, but a saxophone instead of Thom Yorke? Maybe? (YouTube)
    • Correction, Piano Rock will quickly become your new favorite channel. (YouTube)
  • docker builder is a command prefix that you can use that specifically operates against the builder. For example you can prune the builder’s cache without wiping out your local cache. It can really save your bacon if you’re working with a lot of images. (docs.docker.com)
  • Ever want to convert YAML to JSON so you can see nesting issues easier? There’s a VSCode plugin for that! Search for hilleer.yaml-plus-json or find it on GitHub. (GitHub)
  • Spotify has a great interface, but Apple Audio has lossless audio, sounds great, and pays artists more. Give it a shot! If you sign up for Apple One you can get Apple Music, Apple TV+, Apple Arcade, Apple News+ and a lot more for one unified price. (Apple)
Direct download: coding-blocks-episode-202.mp3
Category:Software Development -- posted at: 11:16pm EDT

Michael spends the holidays changing his passwords, Joe forgot to cancel his subscriptions, and Allen’s busy playing Call of Duty: Modern Healthcare as we discuss the our 2023 resolutions.

The full show notes for this episode are available at https://www.codingblocks.net/episode201.

News

  • Thanks for the reviews CourageousPotato, Billlhead, [JD]Milo!
    • Want to help us out? Leave us a review.
  • Game Jam is coming up, January 20-23! (itch.io)
  • Thoughts on LastPass?
    • Check out the encrypted fields, as figured out by a developer. (GitHub)
    • LastPass users: Your info and password vault data are now in hackers’ hands (Ars Technica)
Game Jam Time!

Our 2023 Resolutions

Michael’s

  • Learn Kotlin,
  • Go deeper on streaming technologies, such as Kafka, Flink, and/or Kafka Connect, and
  • Learn more music theory and techniques.
Designing Data Intensive Applications
Drink!

JZ’s

  • Of course Joe has categorized his resolutions into the following areas: finances, health, personal development, and career management,
  • Go deeper on Spring and streaming technologies, and
  • Do more game dev and LeetCode.

Q&A Round 1

  • What skills are opposite and which are adjacent that can be picked up this year?
    • Angular unit testing,
    • Front end development,
    • Spring,
    • Big data concepts and technologies
  • Any books, courses, or certifications?
    • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
    • Certified Kubernetes Application Developer (CKAD) (cncf.io)

Allen’s

  • Spend more time focusing on health and fun,
  • Updating the About Us page with recent info,
  • Go deeper on streaming technologies and conepts,
  • Go deeper on big data concepts such as data lakes, and best practices, etc.,
  • Get back into making content again, such as YouTube, and/or maybe presenting.

Q&A Round 2

  • What do you want to avoid in 2023?
    • Less Jenkins,
    • Avoid piecemeal Spring upgrades,

2023 Predictions

  • Data, privacy … do we need it?,
  • New languages, frameworks,
  • Generated content (Dalle-2ChatGPTCopilot), and
  • AI ethics
    • ChatGPT Wrote My AP English Essay—and I Passed (WSJ)

Resources

Tip of the Week

  • You can pipe directly to Visual Studio Code (in bash anyway), much easier than outputting to a file and opening it in Code … especially if you end up accidentally checking it in!
    • Example: curl https://www.codingblocks.net | code -
  • Is your trackpad not responding on your new(-ish) MacBook? Run a piece of paper around the edge to clean out any gunk. Also maybe avoid dripping BBQ sauce on it.
  • How does the iOS MFA / Verification Code settings work? We want MFA, but we we’re tired of the runaround!
  • Jump around – nope, not Kris Kross, great tip from Thiyagarajan – keeps track of your most “frecent” directories to make navigation easier (GitHub)
    • There’s a version for PowerShell too – thank you Brad Knowles! (GitHub)
Direct download: coding-blocks-episode-201.mp3
Category:Software Development -- posted at: 8:01pm EDT

1