Coding Blocks

We learn the secrets of a safe deployment practice while continuing to study The DevOps Handbook as Joe is a cartwheeling acrobat, Michael is not, and Allen is hurting, so much.

For those of you that are reading these show notes via their podcast player, you can find this episode’s full show notes at https://www.codingblocks.net/episode140.

Sponsors

  • Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard.
  • Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt.

Survey Says

Do you prefer that your laptop keyboard ...

Take the survey at: https://www.codingblocks.net/episode140.

News

No, you click the button …

Enable Feedback to Safely Deploy Code

  • Without a quick feedback loop:
    • Operations doesn’t like deploying developer code.
    • Developers complain about operations not wanting to deploy their code.
    • Given a button for anyone to push to deploy, nobody wants to push it.
  • The solution is to deploy code with quick feedback loops.
    • If there’s a problem, fix it quickly and add new telemetry to track the fix.
    • Puts the information in front of everyone so there are no secrets.
  • This encourages developers to write more tests and better code and they take more pride in releasing successful deployments.
    • An interesting side effect is developers are willing to check in smaller chunks of code because they know they’re safer to deploy and easier to reason about.
    • This also allows for frequent production releases with constant, tight feedback loops.
  • Automating the deployment process isn’t enough. You must have monitoring of your telemetry integrated into that process for visibility.

Use Telemetry to Make Deployments Safer

  • Always make sure you’re monitoring telemetry when doing a production release,
  • If anything goes wrong, you should see it pretty immediately.
    • Nothing is “done” until it is operating as expected in the production environment.
  • Just because you improve the development process, i.e. more unit tests, telemetry, etc., that doesn’t mean there won’t be issues. Having these monitors in place will enable you to find and fix these issues quicker and add more telemetry to help eliminate that particular issue from happening again going forward.
  • Production deployments are one of the top causes of production issues.
    • This is why it’s so important to overlay those deployments on the metric graphs.

Pager Duty – Devs and Ops together

  • Problems sometimes can go on for extremely long periods of time.
  • Those problems might be sent off to a team to be worked on, but they get deprioritized in lieu of some features to be added.
    • The problems can be a major problem for operations, but not even a blip on the radar of dev.
    • Upstream work centers that are optimizing for themselves reduces performance for the overall value stream.
      • This means everyone in the value stream should share responsibility for handing operational incidents.
  • When developers were awakened at 2 AM, New Relic found that issues were fixed faster than ever.
  • Business goals are not achieved when features have been marked as “done”, but instead only when they are truly operating properly.

Have Developers Follow Work Downstream

  • Having a developer “watch over the shoulder” of end-users can be very eye-opening.
    • This almost always leads to the developers wanting to improve the quality of life for those users.
  • Developers should have to do the same for the operational side of things.
    • They should endure the pain the Ops team does to get the application running and stable.
    • When developers do this downstream, they make better and more informed decisions in what they do daily, in regards to things such as deployability, manageability, operability, etc.

Developers Self-Manage Their Production Service

  • Sometimes deployments break in production because we learn operational problems too late in the cycle.
  • Have developers monitor and manage the service when it first launches before handing over to operations.
    • This is practiced by Google.
    • Ops can act as consultants to assist in the process.
  • Launch guidance:
    • Defect counts and severity
    • Type and frequency of pager alerts
    • Monitoring coverage
    • System architecture
    • Deployment process
    • Production hygiene
  • If these items in the checklist aren’t met, they should be addressed before being deployed and managed in production.
  • Any regulatory compliance necessary? If so, you now have to manage technical AND security / compliance risks.
  • Create a service hand back mechanism. If a production service becomes difficult to manage, operations can hand it back to the developers.
    • Think of it as a pressure release valve.
    • Google still does this and shows a mutual respect between development and operations.

Resources We Like

  • The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon)
  • The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon)
  • The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon)
  • Improve mobile user experience with Datadog Mobile Real User Monitoring (Datadog)

Tip of the Week

  • Configure an interpreter using Docker (JetBrains)
    • JetBrains describes how to connect PyCharm to use Docker as the interpreter.
  • BONUS: Why Date-ing is Hard (episode 102)
    • We discuss using the venv Python module to create seperate virtual environments, allowing each to have their own version dependencies. (docs.python.org)
    • To use venv,
      • Create the virtual environment: python -m venv c:\path\to\myenv
      • Activate the virtual environment: c:\path\to\myenv\Scripts\activate.bat
      • NOTE that the venv module documentation includes the variations for different OSes and shells.
  • Node Anchors in YAML (yaml.org)
  • Tweaks (Visual Studio Marketplace)
    • Install Tweaks to gain features, such as Presentation Mode, for Visual Studio.
  • Angular state inspector (chrome web store)
  • Angular Language Service (Visual Studio Marketplace)
  • Angular Snippets (Version 9) (Visual Studio Marketplace)
    • NOTE that the author has similar plugins available for different Angular versions.
Direct download: coding-blocks-episode-140.mp3
Category:Software Development -- posted at: 8:01pm EDT

We’re using telemetry to fill in the gaps and anticipate problems while discussing The DevOps Handbook, while Michael is still weird about LinkedIn, Joe knows who’s your favorite JZ, and Allen might have gone on vacation.

You can find these show notes at https://www.codingblocks.net/episode139, in case you’re reading these within your podcast player.

Sponsors

  • Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard.
  • Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt.

Survey Says

What's your favorite mobile device?

Joe’s Super Secret Survey

Go or Rust?

Take both surveys at: https://www.codingblocks.net/episode139.

News

  • Thank you to everyone that left us a new review:
    • iTunes: AbhiZambre, Traz3r
    • Stitcher: AndyIsTaken
  • Most important things to do for new developer job seekers?

I Got 99 Problems and DevOps ain’t One

Find and Fill Any Gaps

Once we have telemetry in place, we can identify any gaps in our metrics, especially in the following levels of our application:

  • Business level – These are metrics on business items, such as sales transactions, signups, etc.
  • Application level – This includes metrics such as timing metrics, errors, etc.
  • Infrastructure level – Metrics at this level cover things like databases, OS’s, networking, storage, CPU, etc.
  • Client software level – These metrics include data like errors, crashes, timings, etc.
  • Deployment pipeline level – This level includes metrics for data points like test suite status, deployment lead times, frequencies, etc.

Application and Business Metrics

  • Gather telemetry not just for technical bits, but also organizational goals, i.e. things like new users, login events, session lengths, active users, abandoned carts, etc.
  • Have every business metric be actionable. And if they’re not actionable, they’re “vanity metrics”.
  • By radiating these metrics, you enable fast feedback with feature teams to identify what’s working and what isn’t within their business unit.

Infrastructure Metrics

  • Need enough telemetry to identify what part of the infrastructure is having problems.
  • Graphing telemetry across infrastructure and application allows you to detect when things are going wrong.
  • Using business metrics along with infrastructure metrics allows development and operations teams to work quickly to resolve problems.
  • Need the same telemetry in pre-production environments so you can catch problems before they make it to production.

Overlaying other Relevant Information onto Our Metrics

  • In addition to our business and infrastructure telemetry graphing, you also want to graph your deployments so you can quickly correlate if a release caused a deviation from normal.
    • There may even be a “settling period” after a deployment where things spike (good or bad) and then return to normal. This is good information to have to see if deployments are acting as expected.
  • Same thing goes for maintenance. Graphing when maintenance occurs helps you correlate infrastructure and application issues at the time they’re deployed.

Resources We Like

  • The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon)
  • The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon)
  • The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon)
  • The ONE Metric More Important Than Sales & Subscribers (YouTube)
  • 2020 Developer Survey – Most Loved, Dreaded, and Wanted Languages (Stack Overflow)
  • Instrument your Python applications with Datadog and OpenTelemetry (Datadog)
  • Why does speed matter? (web.dev)
  • Dash goes virtual! Join us on Tuesday, August 11 (Datadog)

Tip of the Week

  • Google Career Certificates (grow.google)
    • Google Offers 100,000 Scholarships – Here’s How To Get One (Forbes)
    • Grow with Google (grow.google)
  • Hearth Bound (HearthBoundPodcast.comTwitter)
  • Tsunami (GitHub) is a general purpose network security scanner with an extensible plugin system for detecting high severity vulnerabilities with high confidence.
    • Plugins for Tsunami Security Scanner (GitHub)
Direct download: coding-blocks-episode-139.mp3
Category:Software Development -- posted at: 8:01pm EDT

It’s all about telemetry and feedback as we continue learning from The DevOps Handbook, while Joe knows his versions, Michael might have gone crazy if he didn’t find it, and Allen has more than enough muscles.

For those that use their podcast player to read these show notes, did you know that you can find them at https://www.codingblocks.net/episode138? Well, you can. And now you know, and knowing is half the battle.

Sponsors

  • Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard.
  • Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt.

Survey Says

Which one?

Take the survey at: https://www.codingblocks.net/episode138.

News

  • We give a heartfelt thank you in our best announcer voice to everyone that left us a new review!
    • iTunes: TomJerry24, Adam Korynta
    • Stitcher: VirtualShinKicker, babbansen, Felixcited
  • Cost of a Data Breach Report 2020 (IBM)
  • Garmin Risks Repeat Attack If It Paid $10 Million Ransom (Forbes)
  • Almost 4,000 databases wiped in ‘Meow’ attacks (WeLiveSecurity.com)

The Second Way: The Principles of Feedback

Implementing the technical practices of the Second Way

  • Provides fast and continuous feedback from operations to development.
  • Allows us to find and fix problems earlier on the software development life cycle.

Create Telemetry to Enable Seeing and Solving Problems

  • Identifying what causes problems can be difficult to pinpoint: was it the code, was it networking, was it something else?
  • Use a disciplined approach to identifying the problems, don’t just reboot servers.
  • The only way to do this effectively is to always be generating telemetry.
    • Needs to be in our applications and deployment pipelines.
    • More metrics provide the confidence to change things.
  • Companies that track telemetry are 168 times faster at resolving incidents than companies that don’t, per the 2015 State of DevOps Report (Puppet).
    • The two things that contributed to this increased MTTR ability was operations using source control and proactive monitoring (i.e. telemetry).

Create Centralized Telemetry Infrastructure

  • Must create a comprehensive set of telemetry from application metrics to operational metrics so you can see how the system operates as a whole.
    • Data collection at the business logic, application, and environmental layers via events, logs and metrics.
    • Event router that stores events and metrics.
      • This enables visualization, trending, alerting, and anomaly detection.
      • Transforms logs into metrics, grouping by known elements.
    • Need to collect telemetry from our deployment pipelines, for metrics like:
      • How many unit tests failed?
      • How long it takes to build and execute tests?
      • Static code analysis.
  • Telemetry should be easily accessible via APIs.
  • The telemetry data should be usable without the application that produced the logs

Create Application Logging Telemetry that Helps Production

  • Dev and Ops need to be creating telemetry as part of their daily work for new and old services.
Should at least be familiar with the standard log levels
  • Debug – extremely verbose, logs just about everything that happens in an application, typically disabled in production unless diagnosing a problem.
  • Info – typically action based logging, either actions initiated by the system or user, such as saving an order.
  • Warn – something you log when it looks like there might be a problem, such as a slow database call.
  • Error – the actual error that occurs in a system.
  • Fatal – logs when something has to exit and why.
Using the appropriate log level is more important than you think
  • Low toner is not an Error. You wouldn’t want to be paged about low toner while sleeping!
  • Examples of some things that should be logged:
    • Authentication events,
    • System and data access,
    • System and app changes,
    • Data operations (CRUD),
    • Invalid input,
    • Resource utilization,
    • Health and availability,
    • Startups and shutdowns,
    • Faults and errors,
    • Circuit breaker trips,
    • Delays,
    • Backup success and failure

Use Telemetry to Guide Problem Solving

  • Lack of telemetry has some negative issues:
    • People use it to avoid being blamed for problems, which can be due to a political atmosphere and SUPER counter-productive.
  • Telemetry allows for scientific methods of problem solving to be used.
    • This approach leads to faster MTTR and a much better relationship between Dev and Ops.

Enable Creation of Production Metrics as Part of Daily Work

  • This needs to be easy, one-line implementations.
  • Use data to generate graphs, and then overlay those graphs with production changes to see if anything changed significantly.
    • This gives you the confidence to make changes.

Create Self-Service Access to Telemetry and Information Radiators

  • Make the data available to anyone in the value stream without having to jump through hoops to get it, be they part of Development, Operations, Product Management, or Infosec, etc.
  • Information radiators are displays which are placed in highly visible locations so everyone can see the information quickly.
    • Nothing to hide from visitors OR from the team itself.

Resources We Like

  • The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon)
  • The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon)
  • The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon)
  • 2015 State of DevOps Report (Puppet)
  • StatsD (GitHub)
  • Graphite (graphiteapp.org)
  • Grafana (grafana.com)
  • The Twelve-Factor App (12factor.net)
    • The Twelve-Factor App: Codebase, Dependencies, and Config (episode 32)
    • The Twelve-Factor App: Backing Services, Building and Releasing, Stateless Processes (episode 33)
    • The Twelve-Factor App: Port Binding, Concurrency, and Disposability (episode 35)
    • The Twelve Factor App: Dev/Prod Parity, Logs, and Admin Processes (episode 36)
  • Break Up With IE8 (breakupwithie8.com)

Tip of the Week

  • Bookmarks for VS Code (GitHubVisual Studio Marketplace)
  • Pwn your zsh! (ohmyz.sh)
    • Companion cheetsheet (GitHub)
  • Use Docker BuildKit’s experimental features to enable and use build caches (GitHub)
  • Disable all of your VS Code extensions and then re-enable just the ones you need using CTRL+SHIFT+P. (code.visualstudio.com)
  • Color code your environments in Datagrip! Right click on the server and select Color Settings. Use green for local and red for everything else to easily differentiate between the two. Can be applied at the server and/or DB levels. For example, color your default local postgres database orange. This color coding will be applied to both the navigation tree and the open file editors (i.e. tabs).
Direct download: coding-blocks-episode-138.mp3
Category:Software Development -- posted at: 8:21pm EDT

1