Coding Blocks

While we continue to dig into Designing Data-Intensive Applications, we take a step back to discuss data models and relationships as Michael covers all of his bases, Allen has a survey answer just for him, and Joe really didn’t get his tip from Reddit.

This episode’s full show notes can be found at https://www.codingblocks.net/episode124, in case you’re reading this via your podcast player, where you can be a part of the conversation.

Sponsors

  • Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.
  • Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 10% off any course or annual subscription.
  • Clubhouse – The fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Sign up to get two additional free months of Clubhouse on any paid plan by visiting clubhouse.io/codingblocks.

Survey Says

Which keyboard do you use?

Take the survey at: https://www.codingblocks.net/episode124.

News

  • Thank you for the awesome reviews:
    • iTunes: Kampfirez, Ameise776, JozacAlanOutlaw, skmetzger, Napalm684, Dingus the First
  • Get your tickets now for NDC { London }, January 27th – 31st, where you can kick Allen in the shins where he will be giving his talk, Big Data Analytics in Near-Real-Time with Apache Kafka Streams. (ndc-london.com)
  • Hurry and sign up for the South Florida Software Developers Conference 2020, February 29th, where Joe will be giving his talk, Streaming Architectures by Example. This is a great opportunity for you to try to kick him in the shins. (fladotnet.com)
  • The CB guys will be at the 15th Annual Orlando Code Camp & Tech Conference, March 28th. Sign up for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com)

Relationships … It’s complicated

Normalization

  • Relational databases are typically normalized.
    • A quick description of normalization would be associating meaningful data with a key and then relating data by keys rather than storing all of the data together.
  • Normalization reduces redundancy and improve data integrity.
  • Relational normalization has several benefits:
    • Consistent styling and spelling for meaningful values.
    • No ambiguity, even when text values are coincidentally the same, for example, Georgia the state vs Georgia the country.
    • Updating meaningful values is easy since there is only one spot to change.
    • Language localization support can be easier because you can associate different meaningful values with the same key for each supported language.
    • Search for hierarchical relationships can be easier, for example, getting a list of cities for a particular state.
      • This can vary based on how the data is stored. See episode 28 and episode 29 for more detailed discussions related to some strategies.
  • There are legitimate reasons for having denormalized data in a relational database, like faster searches, although there might be better tools for the specific use case.

Relationships …

In Document Databases

  • Document databases struggle as relationships get more complicated.
  • Document database designers have to make careful decisions about where data will be stored.
  • A big benefit of document databases is locality, meaning all of the relevant data for an entity is stored in one spot.
    • Fetching an order object is one simple get in a document database, while the relational database might end up being more than one query and will surely join multiple tables.

In Relational Databases

  • There are several benefits of relational database relationships, particularly Many-to-One and Many-to-Many relationships
    • To illustrate a Many-to-One example, there are many parts associated to one particular computer.
    • To illustrate a Many-to-Many example, a person can be associated to many computers and a computer can be associated to many people.
  • As your product matures, your database (typically) gets more complicated. The relational model holds up really well to these changes over time. The queries get more complicated as you add more relationships, but your flexibility remains.

Query Optimization

  • A query optimizer, a common part of popular RDBMSes, is responsible for deciding which parts of your written query to execute in which order and which indexes to use.
  • The query optimizer has a huge impact on performance and is a big part of the reason why proprietary RDBMSes like Oracle and SQL Server are so popular.
    • Imagine if you, the developer, had to be smarter about the order that you joined your tables and the order of items in your WHERE clause …
      • and then ratios of data in the tables were different in production vs development,
      • and then a new index was added, …
  • The query optimizer uses advanced statistics about your data to make smart choices about how to execute your query.
  • A key insight into the relational model is that the query optimizer only has to be built once and everybody benefits from it.
  • In document databases, the developers and data model designers have to consider their designs and querying constantly.

How to choose Document vs Relational

Document Databases …

  • Better performance in some use cases because of locality.
  • Often scale very well because of the locality.
  • Are flexible in what they can store, often called “schemaless” or “schema on read”, but put another way, this is a lack of enforced integrity.
  • Have poor support for joining because you have to fetch the whole document for a simple lookup.
  • Require extra care when designing because it’s difficult to change the document formats after the fact and because there is no generic query optimizer available.

Relational Databases …

  • Can provide powerful relationships, particularly with highly connected data.
  • However, they don’t scale horizontally very well.

Resources We Like

  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
  • Grokking the System Design Interview (Educative.io)
  • Generate metrics from your logs to view historical trends and track SLOs (Datadog)
  • Hierarchical Data – Adjacency Lists and Nested Set Models (episode 28)
  • Hierarchical Data cont’d – Path Enumeration and Closure Tables (episode 29)

Tip of the Week

  • Presto – The Distributed SQL Query Engine for Big Data. (prestodb.io)
  • Use the Files app in iOS to proxy files from Box or Google Drive (support.apple.com)
  • Pin tabs in Chrome for all of your must have open tabs. (support.google.com)
  • Use the Microsoft Authenticator to keep all of your one-time passwords in sync across all of your devices. And it requires you authenticate with it to even see the OTPs! (App StoreGoogle Play)
  • Combine Poker with learning with Varianto:25’s Git playing cards. (varianto25.com)
  • Search your Gmail for unread old emails with queries like before:2019/01/01 is:unread.
  • The new JetBrains Mono font is almost as awesome as the page that describes it. (JetBrains)
Direct download: coding-blocks-episode-124.mp3
Category:Software Development -- posted at: 12:31am EDT

We’re comparing data models as we continue our deep dive into Designing Data-Intensive Applications as Coach Joe is ready to teach some basketball, Michael can’t pronounce 6NF, and Allen measured some geodesic distances just this morning.

For those reading these show notes via a podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode123 where you can also join in on the conversation.

Sponsors

  • Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.
  • Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 20% off any course or, for a limited time, get 50% off an annual subscription.
  • ABOUT YOU – One of the fastest growing e-commerce companies headquartered in Hamburg, Germany that is growing fast and looking for motivated team members like you. Apply now at aboutyou.com/job.

Survey Says

Which data model do you prefer?

Take the survey at: https://www.codingblocks.net/episode123.

 

News

  • We thank everyone that took a moment to leave us a review:
    • iTunes: BoulderDude333, the pang1, fizch26
  • Hurry up and get your tickets now for NDC { London }, January 27th – 31st, where Allen will be giving his talk, Big Data Analytics in Near-Real-Time with Apache Kafka Streams. This is your chance to kick him in the shins on the other side of the pond. (ndc-london.com)
  • Sign up for your chance to kick Joe in the shins at the South Florida Software Developers Conference 2020, February 29th, where he will be giving his talk, Streaming Architectures by Example. (fladotnet.com)
  • Want a chance to kick all three Coding Blocks hosts in the shins? Sign up for the 15th Annual Orlando Code Camp & Tech Conference, March 28th, for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com)

Data Models

  • Data models are one of the most important pieces of developing software.
    • It dictates how the software is written.
    • And it dictates how we think about the problems we’re solving.
  • Software is typically written by stacking layers of modeling on top of each other.
    • We write objects and data structures to reflect the real world.
    • These then get translated into some format that will be persisted in JSON, XML, relational tables, graph db’s, etc.
      • The people that built the storage engine had to determine how to model the data on disk and in memory to support things like search, fast access, etc.
        • Even further down, those bits have to be converted to electrical current, pulses of light, magnetic fields and so on.
  • Complex applications commonly have many layers: APIs built on top of APIs.
    • What’s the purpose of these layers? To hide the complexity of the layer below it.
      • The abstractions allow different groups of people (potentially with completely different skillsets) to work together.
  • There are MANY types of data models, all with different usages and needs in mind.
    • It can take a LOT of time and effort to master just a single model.
    • Data models have a HUGE impact on how you write your applications, so its important to choose one that makes sense for what you’re trying to accomplish.

Relational Model vs Document Model

  • Best-known model today is probably the ones based on SQL.
  • The relational model was proposed by Edgar Codd back in 1970.
  • The relational model organizes data into relations (i.e. tables in SQL) where each relation contains an unordered collection of tuples (i.e. rows in SQL).
    • People originally doubted it would work but it’s dominance has lasted since the mid-80’s, which the author points out is basically an eternity in software.
  • Origins were based in business data processing, particularly transaction processing.
  • There have been a number of competing data storage and querying approaches over the years.
    • Network and Hierarchical models in 70’s and 80’s,
    • Object databases were competitors in the late 80’s and early 90’s,
    • XML databases,
    • Basically a number a competitors over the years but nobody has dethroned the relational database.
  • Almost everything you see and use today has some sort of relational database working behind it.

NoSQL

  • NoSQL is the latest competitor to Relational Databases.
    • It was originally intended as a catchy Twitter hashtag for a meetup about open source, distributed, non-relational databases.
    • It has since been re-termed to “Not only SQL”.
  • What needs does NoSQL aim to address?
    • The need for greater scalability than traditional RDBMS’s can typically achieve, including very large datasets and fast writes.
    • The desire for FOSS (free and open source software), as opposed to very expensive, commercial RDBMS’s.
    • Specialized query operations that are not supported well in the relational model.
    • Shortcomings of relational models – need for more dynamic and/or expressive data models.
  • Different applications (or even different pieces of the same application) have different needs and may require different data models. For that reason, it’s very likely that NoSQL won’t replace SQL, but rather it’ll augment it.
    • This is referred to as polyglot persistence.

Object-Relational Mismatch

  • Most applications today are written in an object oriented programming language.
  • There’s typically a translation layer required to map the relational data models to an object model.
    • The disconnect between models can be referred to as impedance mismatch.
  • Frameworks like ActiveRecord, Hibernate, Entity Framework, etc., can reduce the boilerplate code needed for the translation but typically don’t fully hide the impedance mismatch issues.

Resources We Like

  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
  • Grokking the System Design Interview (Educative.io)
  • Monitor Azure DevOps workflows and pipelines with Datadog (Datadog)
  • Monitor Amazon EKS on AWS Fargate with Datadog (Datadog)
  • Best practices for tagging your infrastructure and applications (Datadog)
  • Introducing: Educative Subscriptions (Educative.io)
  • Santosh Hari – Not all data is created equal: NoSQL (YouTube)
  • TIOBE Index (tiobe.com)
  • Database Schema for Multiple Types of Products
  •  

Tip of the Week

  • Got data? Use DataGrip. One tool for many databases. (JetBrains)
  • KafkaHQ – A Kafka GUI for topics, data, consumer groups, schema registry and more. (GitHub)
  • Grafka – A GraphQL interface for Apache Kafka (GitHub)
  • Use Google Maps to measure geodesic distances (citylab.com)
  • How to undo (almost) anything with Git (GitHub)
  • Will Save the Galaxy for Food by Yahtzee Croshaw (Amazon)
Direct download: coding-blocks-episode-123.mp3
Category:Software Development -- posted at: 8:39pm EDT

1