We are committed to continuing our deep dive into Git from the Bottom Up by John Wiegley, while Allen puts too much thought into onions, Michael still doesn’t understand proper nouns, and Joe is out hat shopping.
Ludum Dare is a bi-annual game jam that’s been running for over 20 years now. Jam #51 is coming up Sept 30th to October 3rd. (ldjam.com)
We previously talked about Ludum Dare in episode 146.
Commitment Issues
Commits
A commit can have one or more parents.
Those commits can have one more parents.
It’s for this reason that commits can be treated like branches, because they know their entire lineage.
You can examine top level referenced commits with the following command: git branch -v.
A branch is just a named reference to a commit!
A branch and a tag both name a commit, with the exception that a tag can have a description, similar to a commit.
Branches are just names that point to a commit.
Tags have descriptions and point to a commit.
Knowing the above two points, you actually don’t technically need branches or tags. You could do everything pointing to the commit hash id’s if you were insane enough to do so.
Here’s a dangerous command:
git reset --hard commitHash – This is dangerous. --hard says to erase all changes in the working tree, whether they were registered for a check-in or not and reset HEAD to point to the commitHash.
Here’s a safer command:
git checkout commitHash – This is a safer option, because files changed in the working tree are preserved. However, adding the -f parameter acts similar as the previous command, except that it doesn’t change the branch’s HEAD, and instead only changes the working tree.
Some simple concepts to grasp:
If a commit has multiple parents, it’s a merge commit.
If a commit has multiple children, it represents the ancestor of a branch.
Simply put, Git is a collection of commits, each of which holds a tree which reference other trees and blobs, which store data.
All other things in Git are named concepts but they all boil down to the above statement.
A commit by any other name
The key to knowing Git is to truly understand commits.
Learning to name your commits is the way to mastering Git.
branchname – The name of a branch is an alias to the most recent commit on that branch.
tagname – Similar to the branch name in that the name points to a specific commit but the difference is a tag can never change the commit id it points to.
HEAD – The currently checked out commit. Checking out a specific commit takes you out of a “branch” and you are then in a “detached HEAD” state.
The 40 character hash id – A commit can always be referenced by the full SHA1 hash.
You can refer to a commit by a shorter version of the hash id, enough characters to make it unique, usually 6 or 7 characters is enough.
name^ – Using the caret tells Git to go to the parent of the provided commit. If a commit has more than one parent, the first one is chosen.
name^^ – Carets can be stacked, so doing two carets will give the parent of the parent of the provided commit.
name^2 – If a commit has multiple parents, you can choose which one to retrieve by using the caret followed by the number of the parent to retrieve. This is useful for things like merge commits.
name~10 – Same thing as using the commit plus 10 carets. It refers to the named commit’s 10th generation ancestor.
name:path – Used to reference a specific file in the commit’s content tree, excellent when you need to do things like compare file diffs in a merge, like: git diff HEAD^1:somefile HEAD^2:somefile.
name^{tree} – Reference the tree held by a commit rather than the commit itself.
name1..name2 – Get a range of commits reachable from name2 all the way back to, but not including, name1. Omitting name1 or name2 will substitute HEAD in the place.
name1…name2 – For commands like log, gets the unique commits that are referenced by name1 or name2. For commands like diff, the range is is between name2 and the common ancestor of name1 and name2.
main.. – Equivalent to main..HEAD and useful when comparing changes made in the current branch to the branch named main.
..main – Equivalent to HEAD..main and useful for comparing changes since the last rebase or merge with the branch main, after fetching it.
-since=”2 weeks ago” – All commits from a certain relative date.
–until=”1 week ago” – All commits before a certain relative date.
–grep=pattern – All commits where the message meets a certain regex pattern.
–committer=pattern — Find all the commits where the committer matches a regex pattern.
–author=pattern – All commits whose author matches the pattern.
So how’s that different than the committer? “The author of a commit is the one who created the changes it represents. For local development this is always the same as the committer, but when patches are being sent by e-mail, the author and the committer usually differ.”
–no-merges – Only return commits with a single parent, i.e. ignore all merge commits.
Not sure where the history of your branch started from and want an easy button? Check out Allen’s TotW from episode 182.
Need to search the entire history of the repo for some content (text, code, etc.) that’s not part of the current branch? Content, not a commit comment, not a commit ID, but content. Check out Michael’s TotW from episode 31.
Nobody Likes Onions, a podcast that has been making audiences laugh at the absurd, the obvious, and the wrong, for a very long time. (NobodyLikesOnions.com)
Tip of the Week
Supabase is an open-source alternative to Google’s Firebase that is based on PostgreSQL. The docs are great and it’s really easy to work through the “Getting Started” guide to set up a new project in the top framework of your choice, complete with a (for now) free, hosted PostgreSQL database on Heroku, with authentication (email/password or a myriad of providers). RBAC is controlled via database policies and everything can be administered through the portal. You can query the database with a simple DSL. Joe was able to work through a small project and get it hosted on Netlify (with SSL!) all for free in under 2 hours. (supabase.com)
Obsidian is a really cool way to associate markdown data with your files. (Thanks Simon Barker!) (obsidian.md)
Ever use a “mind map” tool? MindNode is a great, free, mind mapping tool to help you organize your thoughts (Thanks Sean Martz!) (mindnode.com)
Ink Drop is a cool way to organize and search your markdown files (inkdrop.app) (Thanks Lars!)
Tired of git log knocking the rest of your content off screen? You can configure Git to run a custom “core.pager” command with the args you prefer: (serebrov.github.io)
To configure just Git: git config --global --replace-all core.pager "less -iXFR"
Or, to modify how less prints to the screen and commands that rely on it, including Git, edit your ~/.bashrc or ~/.zshrc, etc. and add export LESS=-iXFR to the file.
It’s surprising how little we know about Git as we continue to dive into Git from the Bottom Up, while Michael confuses himself, Joe has low standards, and Allen tells a joke.
Thanks for all the great feedback on the last episode and for sticking with us!
Directory Content Tracking
Put simply, Git just keeps a snapshot of a directory’s contents.
Git represents your file contents in blobs (binary large object), in a structure similar to a Unix directory, called a tree.
A blob is named by a SHA1 hashing of the size and contents of the file.
This verifies that the blob contents will never change (given the same ID).
The same contents will ALWAYS be represented by the same blob no matter where it appears, be it across commits, repositories, or even the Internet.
If multiple trees reference the same blob, it’s simply a hard link to the blob.
As long as there’s one link to a blob, it will continue to exist in the repository.
A blob stores no metadata about its content.
This is kept in the tree that contains the blob.
Interesting tidbit about this: you could have any number of files that are all named differently but have the same content and size and they’d all point to the same blob.
For example, even if one file were named abc.txt and another was named passwords.bin in separate directories, they’d point to the same blob.
The author creates a file and then calculates the ID of the file using git hash-object filename.
If you were to do the same thing on your system, assuming you used the same content as the author, you’d get the same hash ID, even if you name the file different than what they did.
git cat-file -t hashID will show you the Git type of the object, which should be blob.
git cat-file blob hashID will show you the contents of the file.
The commands above are looking at the data at the blob level, not even taking into account which commit contained it, or which tree it was in.
Git is all about blob management, as the blob is the fundamental data unit in Git.
Blobs are Stored in Trees
Remember there’s no metadata in the blobs, and instead the blobs are just about the file’s contents.
Git maintains the structure of the files within the repository in a tree by attaching blobs as leaf nodes within a tree.
git ls-tree HEAD will show the tree of the latest commit in the current directory.
git rev-parse HEAD decodes the HEAD into the commit ID it references.
git cat-file -t HEAD verifies the type for the alias HEAD (should be commit).
git cat-file commit HEAD will show metadata about the commit including the hash ID of the tree, as well as author info, commit message, etc.
To see that Git is maintaining its own set of information about the trees, commits and blobs, etc., use find .git/objects -type f and you’ll see the same IDs that were shown in the output from the previous Git commands.
How Trees are Made
There’s a notion of an index, which is what you use to initially create blobs out of files.
If you just do a git add without a commit, assuming you are following along here (jwiegly.github.io), git log will fail because nothing has been committed to the repository.
git ls-files --stage will show your blob being referenced by the index.
At this point the file is not referenced by a tree or a commit, it’s only in the .git/index file.
git write-tree will take the contents of the index and write it to a tree, and the tree will have it’s own hash ID.
If you followed along with the link above, you’d have the same hash from the write-tree that we get.
A tree containing the same blob and sub-trees will always have the same hash.
The low-level write-tree command is used to take the contents of the index and write them into a new tree in preparation for a commit.
git commit-tree takes a tree’s hash ID and makes a commit that holds it.
If you wanted that commit to reference a parent, you’d have to manually pass in the parent’s commit ID with the -p argument.
This commit ID will be different for everyone because it uses the name of the creator of the commit as well as the date when the commit is created to generate the hash ID.
Now you have to overwrite the contents of .git/refs/heads/master with the latest commit hash ID.
This tells Git that the branch named master should now reference the new commit.
A safer way to do this, if you were doing this low-level stuff, is to use git update-ref refs/heads/master hashID.
git symbolic-ref HEAD refs/heads/master then associates the working tree with the HEAD of master.
What Have We Learned?
Blobs are unique!
Blobs are held by Trees, Trees are held by Commits.
HEAD is a pointer to a particular commit.
Commits usually have a parent, i.e. previous, commit.
We’ve got a better understanding of the detached HEAD state.
What a lot of those files mean in the .git directory.
Resources We Like
Things I wish everyone knew about Git (Part 1) (blog.plover.com)
Have you ever heard the tale of … the forbidden files in Windows? Windows has a list of names that you cannot use for files. Twitter user @foone has done the unthinkable and created a repository of these files. What would happen if you checked this repository out on Windows?
Check out this convenient repository in Windows. (GitHub)
When you use mvn dependency:tree, grep is your enemy. If you want to find out who is bringing in a specific dependency, you really need to use the -Dincludes flag.
Thanks to @ttutko for this tip about redirecting output:
kafkacat 2>&1 | grep "". If you’re not familiar with that syntax, it just means pipe STDERR to STDOUT and then pipe that to grep.
Thanks Volkmar Rigo for this one!
Dangit, Git!? Git is hard: messing up is easy, and figuring out how to fix your mistakes is impossible. This website has some tips to get you out of a jam. (DangitGit.com)
How to vacay … step 1 temporarily disable your work email (and silence Slack, Gchat, whateves).
On iOS, go to Settings -> Mail -> Accounts -> Select your work account -> Turn off the Mail slider.
After working with Git for over a decade, we decide to take a deep dive into how it works, while Michael, Allen, and Joe apparently still don’t understand Git.
This episode was inspired by an article written by Mark Dominus.
Git commits are immutable snapshots of the repository.
Branches are named sequences of commits.
Every object gets a unique id based on its content.
The author is not a fan of how the command set has evolved over time.
With Git, you need to think about what state your repository is in, and what state you would like to be in.
There are likely a number of ways to achieve that desired state.
If you try to understand the commands without understanding the model, you can get lost. For example:
git reset does three different things depending on the flags used,
git checkout even worse (per the author), and
The opposite of git-push is not git-pull, it’s git-fetch.
Possibly the worst part of the above is if you don’t understand the model and what’s happening to the model, you won’t know the right questions to ask to get back into a good state.
Mark said the thing that saved him from frustration with Git is the book Git from the Bottom Up by John Wiegley (jwiegley.github.io)
Mark doesn’t love Git, but he uses it by choice and he uses it effectively. He said that reading Wiegley’s book is what changed everything for him. He could now “see” what was happening with the model even when things went wrong.
It is very hard to permanently lose work. If something seems to have gone wrong, don’t panic. Remain calm and ask an expert.
Mark Dominus
Git from the Bottom Up
A repository – “is a collection of commits, each of which is an archive of what the project’s working tree looked like at a past date, whether on your machine or someone else’s.” It defines HEAD, which identifies the branch or commit the current tree started from, and contains a set of branches or tags that allow you to identify commits by a name.
The index is what will be committed on the next commit. Git does not commit changes from the working tree into the repository directly so instead, the changes are registered into the index, which is also referred to as a staging area, before committing the actual changes.
A working tree is any directory on your system that is associated with a Git repository and typically has a .git folder inside it.
Why typically? Thanks to the git-worktree command, one .git directory can be used to support multiple working trees, as previously discussed in episode 128.
A commit is a snapshot of your working tree at some point in time. “The state of HEAD (see below) at the time your commit is made becomes that commit’s parent. This is what creates the notion of a ‘revision history’.”
A branch is a name for a commit, also called a reference. This stores the history of commits, the lineage and is typically referred to as the “branch of development”
A tag is also a name for a commit, except that it always points to the same commit unlike a branch which doesn’t have to follow this rule as new commits can be made to the branch. A tag can also have its own description text.
master was typically, maybe not so much now, the default branch name where development is done in a repository. Any branch name can be configured as the default branch. Currently, popular default branch names include main, trunk, and dev.
HEAD is an alias that lets the repository identify what’s currently checked out. If you checkout a branch, HEAD now symbolically points to that branch. If you checkout a tag, HEAD now refers only to that commit and this state is referred to as a “detached HEAD“.
The typical work flow goes something like:
Create a repository,
Do some work in your working tree,
Once you’ve achieved a good “stopping point”, you add your changes to the index via git add, and then
Once your changes are in the state you want them and in your index, you are ready to put your changes into the actual repository, so you commit them using git commit.
Resources We Like
Things I wish everyone knew about Git (Part 1) (blog.plover.com)
Designing Data-Intensive Applications – SSTables and LSM-Trees (episode 128)
Tip of the Week
Celeste is a tough, but forgiving game that is on all major platforms. It was developed by a tiny team, 2 programmers, and it’s a really rewarding and interesting experience. Don’t sleep on this game any longer! (CelesteGame.com)
Enforcer Maven plugin is a tool for unknotting dependency version problems, which can easily get out of control and be a real problem when trying to upgrade!
Maven Enforcer Plugin – The Loving Iron Fist of MavenTM (maven.apache.org)
Tired of sending messages too early in Slack? You can set your Slack preferences to make ENTER just do a new line! Then use CMD + ENTER on MacOS or CTRL + ENTER on Windows to send the message! Thanks for the amazing tip from Jim Humelsine! (Slack)
Using Docker Desktop, and want to run a specific version? Well … you can’t really! You have to pick a version of Docker Desktop that corresponds to your target version of Kubernetes!
Alternatively you can just use Minikube to target a specific Kubernetes version (minikube.sigs.k8s.io)
Save a life, donate blood, platelets, plasma, or marrow (redcrossblood.org)
What if you want to donate blood marrow or cord blood? You need to be matched with a recipient first. Check eligibility on the website at Be The Match. (bethematch.org)
Also, not quite as important, you can disable all of the stupid sounds (bells) in WSL!
Disable beep in WSL terminal on Windows 10 (Stack Overflow)