Articles on Git
Last updated: 2023/01/23
Top deep-dives on Git
I feel like Ash collecting new Pokemon whenever I learn a cool shortcut or neat automation on my computer. Martin Heinz's article for git optimizations is like running through the tall grass; you see a couple of things you know, but then BAM pops out the thing you never knew you were actually looking for.
Git has become a major part of most software development infrastructure. And it makes sense. It's a fantastic tool for keeping track of work from lots of people (most of the times). In this informative article, Dino Esposito talks about Git's history, explores the differences between distributed and centralized source code control systems, git's philosophy, and finally ending with an overview of all the different products available for source code control.
I recently took on an apprentice, and boy oh boy did I not miss dealing with a git noob. To be fair, it's understandable that git doesn't come easy to most; it's easy to get lost in all of the jargon with branches, pushes, pulls, commits, cherry-picks, and merges just to name the basic few. Well this article by Tobias Günther dives into the tangly depths of branches, exploring different types and interactions between them.
Thomas A Caswell presents a thorough approach on what your thought process should be when using git.
You use git, I use git, your grandmother uses git. Everyone uses git. For good reason, it's very practical. How much do you know about git though? Could you contribue a new feature tomorrow? Hmm? Well if not, Gary Verhaegen has yet again written an informative article, this time on the internal workings of the git data model, which will help you "build up a correct intuition of what various git commands actually do under the hood".
Some people love, some people hate, most about everyone uses it. That's right, I'm talking about money. No, just kidding, it's git. And in this extensive article, Derrick Stolee takes it apart and shows us all of git's bits and pieces, from the basic of using hash ids for commits to how renames are tracked.
Mark Seeman uses an analogy to rock climbing to explain how git can be used as a safety rope when writing code.
Dr. Drang does a deep-dive on git diff and highlights the strangeness of its return status codes.
How do you go about figuring out "the minimal set of commits that the two nodes need to send to each other in order to make their graphs the same" in git? Well, you could write a research paper on it like Martin Kleppmann and Heidi Howard did. In this extensive article, Martin discusses the scope of the problem, introduces the solution in the form of Bloom filters, describes the practical relevance, and goes into the details on why Bloom filters are a better fit than the current algorithm git uses.
Monorepos have become fairly popular in recent times (again? I feel like these are one of those concepts that were popular in the 70s, and have no been rediscovered, but couldn't find any sources on this theory). But a few (performance) issues arise with having all of your code contained in one place, especially when it comes to version control. Derrick Stolee's extensive article presents the newly implemented git feature "sparse index", which helps reduce the file load on git when using sparse checkout.
Martin Myrseth highlights some of the more advanced features of git that are not widely known.
- Includes creating empty commits, ranking commits, and octopus merges
- Git can have "conditional" configuration, allowing you to specify rules for certain directories/files
- "I’m a believer that not everything we learn or do has to necessarily have some obvious usefulness in and of itself" -> kind of like the idea behind this newsletter
Derrick Stolee has written a five part series covering git's packed object store, commit history queries, file history queries, distributed synchronization, and scalability. In this first part, Derrick focuses on how Git stores and accesses packed object data.
- Git objects are stored in the .git/objects directory and it's called the object store
- The object store is like a database table with two columns: the object ID and the object content
- Git has references that allow you to create named pointers to keys in the object database
- To select object contents by object ID, the git cat-file command will do the object lookup and provide the necessary information
- To insert an object into the object store, we can write directly to a blob using git hash-object
- A packfile is a concatenated list of objects that is paired with a pack-index file
- Delta compression is a way to compress object data based on the content of a previous object in the packfile
- Delta chains are created when an offset delta is based on another object that is also an offset delta
- Git minimizes the extra work when parsing delta chains by keeping the delta-chains short
- Git commands query the object store in such a way that we are very likely to parse multiple objects in the same delta chain
- Git does not use B-trees is because it doesn’t do “live updating” of packfiles and pack-indexes
- Git does not currently have the capability to update a packfile in real time without shutting down concurrent reads from that file