Coding Blocks

June 2021
S	M	T	W	T	F	S

		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Sun, 6 June 2021

Designing Data-Intensive Applications – Single Leader Replication

We dive back into Designing Data-Intensive Applications to learn more about replication while Michael thinks cluster is a three syllable word, Allen doesn’t understand how we roll, and Joe isn’t even paying attention.

For those that like to read these show notes via their podcast player, we like to include a handy link to get to the full version of these notes so that you can participate in the conversation at https://www.codingblocks.net/episode160.

Survey Says

News

Thank you to everyone that left us a new review:
- Audible: Ashfisch, Anonymous User (aka András)

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair Douglas Adams
Douglas Adams

Book: Designing Data-Intensive Applications

In this episode, we are discussing Data Replication, from chapter 5 of “Designing Data-Intensive Applications”.

Replication in Distributed Systems

When we talk about replication, we are talking about keeping copies of the same data on multiple machines connected by a network
For this episode, we’re talking about data small enough that it can fit on a single machine
Why would you want to replicate data?
- Keeping data close to where it’s used
- Increase availability
- Increase throughput by allowing more access to the data
Data that doesn’t change is easy, you just copy it
3 popular algorithms
- Single Leader
- Multi-Leader
- Leaderless
Well established (1970’s!) algorithms for dealing with syncing data, but a lot data applications haven’t needed replication so the practical applications are still evolving
- Cluster group of computers that make up our data system
- Node each computer in the cluster (whether it has data or not)
- Replica each node that has a copy of the database
Every write to the database needs to be copied to every replica
The most common approach is “leader based replication”, two of the algorithms we mentioned apply
One of the nodes is designated as the “leader”, all writes must go to the leader
The leader writes the data locally, then sends to data to it’s followers via a “replication log” or “change stream”
The followers tail this log and apply the changes in the same order as the leader
Reads can be made from any of the replicas
This is a common feature of many databases, Postgres, Mongo, it’s common for queues and some file systems as well

Synchronous vs Asynchronous Writes

How does a distributed system determine that a write is complete?
The system could hang on till all replicas are updated, favoring consistency…this is slow, potentially a big problem if one of the replicas is unavailable
The system could confirm receipt to the writer immediately, trusting that replicas will eventually keep up… this favors availability, but your chances for incorrectness increase
You could do a hybrid, wait for x replicas to confirm and call it a quorum
All of this is related to the CAP theorem…you get at most two: Consistency, Availability and Partition Tolerance
- Site Note: Can you ever have Consistent/Available databases? https://codahale.com/you-cant-sacrifice-partition-tolerance/
The book mentions “chain replication” and other variants, but those are still rare
- Example: Chain replication in Mongo: https://docs.mongodb.com/manual/tutorial/manage-chained-replication/

Steps for Adding New Followers

Take a consistent snapshot of the leader at some point in time (most db can do this without any sort of lock)
Copy the snapshot to the new follower
The follower connects to the leader and requests all changes since the back-up
When the follower is fully caught up, the process is complete

Handling Outages

Nodes can go down at any given time
What happens if a non-leader goes down?
- What does your db care about? (Available or Consistency)
- Often Configurable
When the replica becomes available again, it can use the same “catch-up” mechanism we described before when we add a new follower
What happens if you lose the leader?
- Failover: One of the replicas needs to be promoted, clients need to reconfigure for this new leader
Failover can be manual or automatic

Rough Steps for Failover

Determining that the leader has failed (trickier than it sounds! how can a replica know if the leader is down, or if it’s a network partition?)
Choosing a new leader (election algorithms determine the best candidate, which is tricky with multiple nodes, separate systems like Apache Zookeeper)
Reconfigure: clients need to be updated (you’ll sometimes see things like “bootstrap” services or zookeeper that are responsible for pointing to the “real” leader…think about what this means for client libraries…fire and forget? try/catch?

Failover is Hard!

How long do you wait to declare a leader dead?
What if the leader comes back? What if it still thinks it’s leader? Has data the others didn’t know about? Discard those writes?
Split brain – two replicas think they are leaders…imagine this with auto-incrementing keys… Which one do you shut down? What if both shut down?
There are solutions to these problems…but they are complex and are a large source of problems
Node failures, unreliable networks, tradeoffs around consistency, durability, availability, latency are fundamental problems with distributed systems

Implementation of Replication Logs

3 main strategies for replication, all based around followers replaying the same changes

Statement-Based Replication

Leader logs every Insert, Update, Delete command, and followers execute them
Problems
- Statements like NOW() or RAND() can be different
- Auto-increments, triggers depend on existing things happen in the exact order..but db are multi-threaded, what about multi-step transactions?
- What about LSM databases that do things with delete/compaction phases?
You can work around these, but it’s messy – this approach is no longer popular
Example, MySQL used to do it

Write Ahead Log Shipping

LSM and B-Tree databases keep an append only WAL containing all writes
Similar to statement-based, but more low level…contains details on which bytes change to which disk blocks
Tightly coupled to the storage engine, this can mean upgrades require downtime
Examples: Postgres, Oracle

Row Based Log Replication

Decouples replication from the storage engine
Similar to WAL, but a litle higher level – updates contain what changed, deletes similar to a “tombstone”
Also known as Change Data Capture
Often seen as an optional configuration (Sql Server, for example)
Examples: (New MySQL/binlog)

Trigger-Based Replication

Application based replication, for example an app can ask for a backup on demand
Doesn’t keep replicas in sync, but can be useful

Resources We Like

Other Episodes on “Designing Data Intensive Applications
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
You Can’t Sacrifice Partition Tolerance (codahale.com)
Manage Chained Replication (docs.mongodb.com)
Doug DeMuro’s YouTube channel (YouTube)
Apache ZooKeeper (Wikipedia, Apache)

Tip of the Week

A collection of CSS generators for grid, gradients, shadows, color palettes etc. from Smashing Magazine.
Learn This One Weird ? Trick To Debug CSS (freecodecamp.org)
- Previously mentioned in episode 81.
Use tree to see a visualization of a directory structure from the command line. Install it in Ubuntu via apt install tree. (manpages.ubuntu.com)
Initialize a variable in Kotlin with a try-catch expression, like val myvar: String = try { ... } catch { ... }. (Stack Overflow)
Manage secrets and protect sensitive data (and more with Hashicorp Vault. (Hashicorp)

Direct download: coding-blocks-episode-160.mp3
Category:Software Development -- posted at: 8:01pm EDT

Sponsors