Coding Blocks

What are lost updates, and what can we do about them? Maybe we don't do anything and accept the write skew? Also, Allen has sharp ears, Outlaw's gort blah spotterfiles, and Joe is just thinking about breakfast.

The full show notes for this episode are available at https://www.codingblocks.net/episode206.

News

  • Thank you for the amazing reviews!
    • iTunes: JomilyAnv
  • Want to help us out? Leave us a review.

Designing Data Intensive Applications
Great book!

Preventing Lost Updates

  • Last episode we talked about weak isolation, committed reads, and snapshot isolation
  • There is one major problem we didn't discuss called "The Lost Update Problem"
  • Consider a read-modify-write transaction, now imagine two of them happening at the same time
  • Even with snapshot isolation, it's possible that read can happen for transaction A before B, but the write for A happens first
    • Incrementing/Decrementing values (counters, bank accounts)
    • Updating complex values (JSON for example)
    • CMS updates that send the full page as an update
  • Solutions:
    • Atomic Writes - Some databases support atomic updates that effectively combine the read and write
      • Cursor Stability - locking the read object until the update is performed
      • Single Threading - Force all atomic operations to happen serially through a single thread
    • Explicit Locking
      • The application can be responsible for explicitly locking objects, placing responsibility in the devs hands
      • This makes sense in certain situations - imagine a multiplayer game where multiple players can move a shared object. It's not enough to lock the data and then apply both updates in order since the shared game world can react. (ie: showing that the item is in use)

Detecting Lost Updates

  • Locks can be tricky, what if we reused the snapshot mechanism we discussed before?
  • We're already keeping a record of the last transactionId to modify our data, and we know our current transactionId. What if we just failed any updates where our current transaction id was less than the transactionId of the last write to our data?
  • This allows for naive application code, but also gives you fewer options…retry or give up
  • Note: MySQL's InnoDB's Repeatable Read feature does not support this, so some argue it doesn't qualify as snapshot isolation

What if you didn't have transactions?

  • If you didn't have transactions, let alone a snapshot number, you could get similar behavior by doing a compare-and-set
  • Example: update account set balance = 10 where balance = 9 and id = ABC
  • This works best in simple databases that support atomic updates, but not great with snapshot isolation
  • Note: it's up to the application code to check that updates were successful - Updating 0 records is not an error

Conflict resolution and replication

  • We haven't talked much about replicas lately, how do we handle lost updates when we have multiple copies of data on multiple nodes?
  • Compare-and-Set strategies and locking strategies assume a single up-to-date copy of the data….uh oh
  • The options are limited here, so the strategy is to accept the writes and have an application process to decide what to do
    • Merge: Some operations, like incrementing a counter, can be safely merged. Riak has special datatypes for these
    • Last Write Wins: This is a common solution. It's simple but inaccurate. Also the most common solution.

Write Skew and Phantoms

  • Write skew - when a race condition occurs that allows writes to different records to take place at the same time that violates a state constraint
    • The example given in the book is the on-call doctor rotation
    • If one record had been modified after another record's transaction had been completed, the race condition would not have taken place
    • write-skew is a generalization of the lost update problem
  • Preventing write-skew
    • Atomic single-object locks won't work because there's more than one object being updated
    • Snapshot isolation also doesn't work in many implementations - SQL Server, PostgreSQL, Oracle, and MySQL won't prevent write skew
      • Requires true serializable isolation
    • Most databases don't allow you to create constraints on multiple objects but you may be able to work around this using triggers or materialized views as your constraint
    • They mention if you can't use serializable isolation, your next best option may be to lock the rows for an update in a transaction meaning nothing else can access them while the transaction is open
  • Phantoms causing write skew
    • Pattern
      • The query for some business requirement - ie there's more than one doctor on call
      • The application decides what to do with the results from the query
      • If the application decides to go forward with the change, then an INSERT, UPDATE, or DELETE operation will occur that would change the outcome of the previous step's Application decision
        • They mention the steps could occur in different orders, for instance, you could do the write operation first and then check to make sure it didn't violate the business constraint
      • In the case of checking for records that meet some condition, you could do a SELECT FOR UPDATE and lock those rows
      • In the case that you're querying for a condition by checking on records to exist, if they don't exist there's nothing to lock, so the SELECT FOR UPDATE won't work and you get a phantom write - a write in one transaction changes the search result of a query in another transaction
  • Snapshot isolation avoids phantoms in read-only queries, but can't stop them in read-write transactions

Materializing conflicts

  • The problem we mentioned with phantom is there'd no record/object to lock because it doesn't exist
  • What if you were to have a set of records that could be used for locking to alleviate the phantom writes?
    • Create records for every possible combination of conflicting events and only use those to lock when doing a write
      • "materializing conflicts" because you're taking the phantom writes and turning them into lock records that will prevent those conflicts
        • This can be difficult and prone to errors trying to create all the combinations of locks AND this is a nasty leakage of your storage into your application
          • Should be a last resort

Resources We Like

Tip of the Week

  • Docker's Buildkit is their backend builder that replaces the "legacy" builder by adding new non-backward compatible functionality. The way you enable buildkit is a little awkward, either passing flags or setting variables as well as enabling the features per Dockerfile, but it's worth it! One of the cool features is the "mount" flag that you can pass as part of a RUN statement to bring in files that are not persisted past that layer. This is great for efficiency and security. The "cache" type is great for utilizing Docker's cache to save time in future builds. The "bind" type is nice for mounting files you only need temporarily. like source code in for a compiled language. The "secret" is great for temporarily bringing in environment variables without persisting them. Type "ssh" is similar to "secret", but for sharing ssh keys. Finally "tmpfs" is similar to swap memory, using an in-memory file system that's nice for temporarily storing data in primary memory as a file that doesn't need to be persisted. (github.com)
  • Did you know Google has a Google Cloud Architecture diagramming tool? It's free and easy to use so give it a shot! (cloud.google.com)
  • ChatGTP has an app for slack. It's designed to deliver instant conversation summaries, research tools, and writing assistance. Is this the end of scrolling through hundreds of messages to catch up on whatever is happening? /chatgpt summarize (salesforce.com)
  • Have you heard about ephemeral containers? It's a convenient way to spin up temporary containers that let you inspect files in a pod and do other debugging activities. Great for, well, debugging! (kubernetes.io)

Direct download: coding-blocks-episode-206.mp3
Category:Software Development -- posted at: 7:55pm EDT

There's this thing called ChatGPT you may have heard of. Is it the end for all software developers? Have we reached the epitome of mankind? Also, should you write your own or find a FOSS solution? That and much more as Allen gets redemption, Joe has a beautiful monologue, and Outlaw debates a monitor that is a thumb size larger than his current setup.

If you're in a podcast player and would prefer to read it on the web, follow this link:
https://www.codingblocks.net/episode205

News

  • Thank you for the amazing reviews!
    • iTunes: MalTheWarlock, Abdullah Nafees, BarnabusNutslap
  • Orlando Code Camp coming up Saturday March 25th

ChatGPT

  • Is this the beginning or the end of software development as we know it?
  • Are you using it for work? Does your work have an AI policy?
  • OpenAI has recently announced a whopping 90% price reduction on their ChatGPT and Whisper APi calls
    • $.002 per 1000 ChatGPT tokens
    • $.006 per minute to Whisper
  • You also get $5 in free credit in your first 3 months, so give it a shot!
  • https://openai.com/pricing

Roll Your Own vs FOSS

  • This probably isn't the first time and it won't be the last we ask the question - should you write your own version of something if there's a good Free Open Source Software alternative out there?

Typed vs Untyped Languages

  • Another topic that we've touched on over the years - which is better and why?
  • Any considerations when working with teams of developers?
  • What are the pros and cons of each?

Cloud Pricing

  • If you're spending a good amount of money in the cloud, you should probably talk to a sales rep for your given cloud and try to negotiate rates. You may be surprised how much you can save. And...you never know until you ask!

Outlaw has the Itch to get a new Monitor

Resources from this episode

Tips of the Week

  • Did you know that the handy, dandy application jq is great for formatting json AND it's also Turing complete? You can do full on programming inside jq to make changes - conditionals, variables, math, filtering, mapping...it's Turing Complete!
    https://stedolan.github.io/jq/
  • Want to freshen up your space, but you just don't have the vision? Give interiorai.com a chance, upload a picture of your room and give it a description. It works better than it should.
  • You can sort your command line output when doing something like an ls
    sort -k2 -b
  • On macOS you can drag a non-fullscreen window to a fullscreen desktop
  • When using the ls -l command in a terminal, that first numeric column shows the number of hard links to a file - meaning the number of names an inode has for that file
  • Argument parser for Python 3 - makes parsing command line arguments a breeze and creates beautiful --help documentation to boot!
    https://docs.python.org/3/library/argparse.html
  • .NET has an equivalent parser we've mentioned in the past
    https://www.nuget.org/packages/NuGet.CommandLine

Direct download: coding-blocks-episode-205.mp3
Category:Software Development -- posted at: 12:58am EDT

1