====================================================================
Published: 30 August 2022
Tags: database, git
Derrick Stolee has written a five part series covering git's packed object store, commit history queries, file history queries, distributed synchronization, and scalability. In this first part, Derrick focuses on how Git stores and accesses packed object data.
Some highlights:
- Git objects are stored in the .git/objects directory and it's called the object store
- The object store is like a database table with two columns: the object ID and the object content
- Git has references that allow you to create named pointers to keys in the object database
- To select object contents by object ID, the git cat-file command will do the object lookup and provide the necessary information
- To insert an object into the object store, we can write directly to a blob using git hash-object
- A packfile is a concatenated list of objects that is paired with a pack-index file
- Delta compression is a way to compress object data based on the content of a previous object in the packfile
- Delta chains are created when an offset delta is based on another object that is also an offset delta
- Git minimizes the extra work when parsing delta chains by keeping the delta-chains short
- Git commands query the object store in such a way that we are very likely to parse multiple objects in the same delta chain
- Git does not use B-trees is because it doesn’t do “live updating” of packfiles and pack-indexes
- Git does not currently have the capability to update a packfile in real time without shutting down concurrent reads from that file