Exploring Git: Staging

- 6 mins

Inspired by Slidetocode’s article on Git, I decided to learn about Git’s internal workings. I feel this knowledge would prove useful whenever I run into any issues with Git. It would also help me ease the barrier of entry for new users of Git by allowing me create accurate analogies.

In this article, we’ll be inspecting the contents of a Git repository as well as use low-level operations to simulate events that occur when a file is staged. This article requires a basic knowledge of Git, the git command and the UNIX terminal.

The .git directory

Git is a content-addressable file system. This means it is a key-value object store where each object is indexed by its SHA-1 value (the object’s key). Commits, tags, files, tree nodes are all different types of objects in this repository.

To view an example of these objects, we can begin by creating a Git repository. The command git init creates the .git directory, the Git repository, which is where Git stores all the objects it works with. If you want to back up or clone your repository, copying this single directory elsewhere gives you nearly everything you need. You can inspect this directory with the command:

$ ls -l .git
total 24
-rw-r--r--   1 gnerkus  staff   23 Jun 21 09:06 HEAD
drwxr-xr-x   2 gnerkus  staff   68 Jun 21 09:06 branches
-rw-r--r--   1 gnerkus  staff  137 Jun 21 09:06 config
-rw-r--r--   1 gnerkus  staff   73 Jun 21 09:06 description
drwxr-xr-x  11 gnerkus  staff  374 Jun 21 09:06 hooks
drwxr-xr-x   3 gnerkus  staff  102 Jun 21 09:06 info
drwxr-xr-x   4 gnerkus  staff  136 Jun 21 09:06 objects
drwxr-xr-x   4 gnerkus  staff  136 Jun 21 09:06 refs

There are many files and directories to inspect here but we’ll focus solely on the objects directory and the index file in this article.

Git Objects

The objects directory is where Git stores its objects, each object identified by a unique key. Git objects may represent file content or information about the repository. For the purpose of this article, we’ll explore only one type of Git object: blob objects. In subsequent articles, we’ll explore the other types as we work within the repository.

Blob Objects

Blob objects are binary representations of the files in the working directory (the files you modify). To understand this better, we’ll need to add a file to our Git repository.

You can create the file with the command below:

echo "A Git Repository" > README.md

We’ll add this file to the repository’s index using the ‘plumbing’ commands available to us.

Plumbing and Porcelain
Git’s ‘plumbing’ is a collection of low-level commands that give access to Git’s true internal representation of a repository. They are less user-friendly than the ‘porcelain’ commands we’re more familiar with.

Porcelain commands are easy to find and are the ones displayed when you run the git --help command in a UNIX terminal. Plumbing commands require a bit of digging. You can find some of the porcelain and plumbing commands in the Git community documentation. We can also view all the commands by viewing the in-built git manual. Type man git in your terminal and scroll until you find the GIT COMMANDS section. The commands are outlined below:

While there are a lot of plumbing commands, we’ll be using a minute subset in our investigation on Git’s objects. We’ll explore the other commands as we make changes to the repository in subsequent articles.

Adding a file to the Git repository

To add a file to the index, we’ll need to follow two key steps: 1. Add the file to Git repository. 2. Add the file to the index.

We can use the plumbing command git hash-object to insert the README.md file into the Git repository:

$ git hash-object -w README.md
19253f9195d5bb5823c3c663f1b28bd35756318b

The -w flag tells hash-object to store the file; otherwise it only returns the key. We can verify that an object has been added to the Git repository by searching for all file objects in the repository:

$ find .git/objects -type f
.git/objects/19/253f9195d5bb5823c3c663f1b28bd35756318b

Git stores this object as a single blob object, named with the SHA-1 checksum of the content and its header. The subdirectory is named with the first 2 characters of the SHA, and the filename is the remaining 38 characters.

We can use the plumbing command git cat-file to view the content of the file:

$ git cat-file -p 19253f9195d5bb5823c3c663f1b28bd35756318b
A Git Repository

The -p flag allows the cat-file command to display the contents based on the file’s type.

The cat-file command can also be used to determine what type of file the object is:

$ git cat-file -t 19253f9195d5bb5823c3c663f1b28bd35756318b
blob

Each version of a file is saved as a different blob object. To demonstrate this, let’s change the content of the README.md file:

$ echo "New version" > README.md
$ git hash-object -w README.md
eb334ded93a1272567e64377e609ea5573a4b30c

You notice that we receive a different hash than the first one. We can inspect the hashes to view their content:

$ git cat-file -p 19253f9195d5bb5823c3c663f1b28bd35756318b
A Git Repository
$ git cat-file -p eb334ded93a1272567e64377e609ea5573a4b30c
New version

The last thing we need to do is add the file to the index. The index file is where Git stores the staging area information. This file is updated whenever the command git add is executed. We’ll add this file to the index using the plumbing command git update-index:

$ git update-index --add README.md

We can now use the git ls-files command to view the contents of the index:

$ git ls-files --stage
100644 eb334ded93a1272567e64377e609ea5573a4b30c 0	README.md

You’ll notice that the index points to the second version of the README.md file (identified by the eb33... hash).

If you execute the git status command, you’ll see that our file has been staged:

$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   README.md

Summary

In summary, we can equate the git add README.md command to the series of commands:

$ git hash-object -w README.md
$ git update-index --add README.md

which creates a blob object in the objects directory and updates the index.

In the next article, we’ll explore tree and commit objects as we commit files to the Git repository. You can learn more about Git’s internal workings from these resources:

Git Community Book: Git Internals

Ifeanyi Oraelosi

Ifeanyi Oraelosi

Making stuff to facilitate learning and creativity. Also, video game experiments.

comments powered by Disqus
rss facebook twitter github youtube mail spotify instagram linkedin google pinterest medium vimeo