--------------------------------------------------------------------------------
title: I Has(kell) a Git
published: 2017-08-05
tags: programming, haskell, git
--------------------------------------------------------------------------------

I struggled with Git for a long time, and every time I thought I had finally made sense of it, I would accidentally delete a repository or mess up a branch, causing me to question my grasp of what I was doing. I found it very difficult to form a mental model of the tool from the proliferation of seemingly endless command line flags that I had to use to achieve anything meaningful, and the cryptic errors that would inevitably result.

When I finally thought I understood what was going on, I offered to give a talk on it to the local functional group, because Git is functional, right? The co-organisers explained that it wouldn't be an interesting or useful talk, but a talk on implementing Git in Haskell would be very welcome.

It turns out that understanding Git from the inside out is far, far easier than whatever I was trying to do earlier, and this blog post is my attempt to share that comfort and understanding with you.

I've chosen to write this as an IHaskell notebook that is available [here](https://github.com/vaibhavsagar/notebooks/tree/master/git-from-scratch), and I've included a `default.nix` to make things easier if you have Nix installed. You should be able to run

```bash
$ nix-build .
$ result/bin/ihaskell-notebook
```

to open a Jupyter notebook environment with all the dependencies you'll need to follow along.

Let's start by picking a Git repository. I picked Ethan Schoonover's [solarized](https://github.com/altercation/solarized/) because it's large, well-known, and was last updated in 2011, so the hashes here shouldn't go out of date.

In [1]:
{-# LANGUAGE OverloadedStrings #-}

:!if [ -d solarized/ ]; then rm -rf solarized; fi
:!git clone /home/vaibhavsagar/repos/solarized
:!cd solarized
:!git show --format=raw -s



Cloning into 'solarized'...
done.

commit e40cd4130e2a82f9b03ada1ca378b7701b1a9110
tree ecd0e58d6832566540a30dfd4878db518d5451d0
parent ab3c5646b41de1b6d95782371289db585ba8aa85
author Trevor Bramble <inbox@trevorbramble.com> 1372482098 -0700
committer Trevor Bramble <inbox@trevorbramble.com> 1372482214 -0700

    add tmux by @seebi!

`git show` displays the latest commit on the current branch, `--format=raw` shows it in raw format, and the `-s` flag suppresses the diff output, which (as we'll see later) isn't part of the commit.

The first thing we have to address is the fact that Git has two storage formats: loose objects and packfiles. Loose objects are used below a certain size threshold as an on-disk format, and packfiles are used to transfer files over the network because transferring one large file has less overhead than transferring lots of small files. Loose objects are easier to work with, so I'm going to convert the packfiles into loose objects.

If you'd like to learn more about packfiles, my favourite resource is Aditya Mukerjee's [Unpacking Git packfiles](https://codewords.recurse.com/issues/three/unpacking-git-packfiles).

In [2]:
:!mv .git/objects/pack/* .
:!cat *.pack | git unpack-objects
:!rm -rf pack-*







`git show` is an example of a 'porcelain' command for users to interact with, as opposed to a 'plumbing' command that is more low-level and meant for Git itself to use under the hood. The latest commit on the current branch is known as the `HEAD` commit, and we should be able to use `git cat-file -p` to get essentially the same output as before (the `-p` flag means 'pretty-print').

In [3]:
:!git cat-file -p HEAD

tree ecd0e58d6832566540a30dfd4878db518d5451d0
parent ab3c5646b41de1b6d95782371289db585ba8aa85
author Trevor Bramble <inbox@trevorbramble.com> 1372482098 -0700
committer Trevor Bramble <inbox@trevorbramble.com> 1372482214 -0700

add tmux by @seebi!

`HEAD` is in fact a file that lives at `.git/HEAD`. Let's view its contents.

In [4]:
:!cat .git/HEAD

ref: refs/heads/master

This is essentially a symlink in text. `refs/heads/master` refers to `.git/refs/heads/master`. What are its contents?

In [5]:
:!cat .git/refs/heads/master

e40cd4130e2a82f9b03ada1ca378b7701b1a9110

Okay, no more pointers! This is a SHA1 hash representing the commit we want. One last `git cat-file -p`...

In [6]:
:!git cat-file -p e40cd4130e2a82f9b03ada1ca378b7701b1a9110

tree ecd0e58d6832566540a30dfd4878db518d5451d0
parent ab3c5646b41de1b6d95782371289db585ba8aa85
author Trevor Bramble <inbox@trevorbramble.com> 1372482098 -0700
committer Trevor Bramble <inbox@trevorbramble.com> 1372482214 -0700

add tmux by @seebi!

As expected, we get the same output as before. On to something different: `e40cd4130e2a82f9b03ada1ca378b7701b1a9110` is a reference to an object stored at `.git/objects/e4/0cd4130e2a82f9b03ada1ca378b7701b1a9110`. The first two characters of the hash are the directory name and the 38 remaining characters are the file name.

This directory structure was chosen as a tradeoff between the number of directories and the number of objects under each directory. One approach might have been to put all objects under `.git/objects`, but then listing all the references in that directory would have gotten unwieldy quickly. Another approach would have been to use the first character of the hash as the directory name, which would lead to at most 16 directories but a still large number of possible objects in each directory. Git settled on the first two characters, which gives us at most 256 directories and a manageable number of objects in each.

Let's confirm that the file does exist, and then look at its contents.

In [7]:
:!ls .git/objects/e4/0cd4130e2a82f9b03ada1ca378b7701b1a9110
:!cat .git/objects/e4/0cd4130e2a82f9b03ada1ca378b7701b1a9110

.git/objects/e4/0cd4130e2a82f9b03ada1ca378b7701b1a9110

xmj1DûÛ§PÐâ/ÙZ(%ô¹´RéB®SÛ×^ ¿ÃY÷Ö¶1ÓÓèf`«zCÒB)b)='¯©
RÌÔ»w;`I+\$µ E¬SÙ'&fBÇ×ñ¹w8uûñÞ¹ÉÁëvývÓÊ¾¬{{ê<ýBðì«÷nÒùtØÿ1ä?cUíz¹ÃñÛL¶ûÈO

Git compresses these files with zlib before storing them, and we'll need to handle this. Fortunately there's a tool called `zlib-flate` (part of the `qpdf` package) that we can use to do this decompression.

In [8]:
:!zlib-flate -uncompress < .git/objects/e4/0cd4130e2a82f9b03ada1ca378b7701b1a9110

commit 248 tree ecd0e58d6832566540a30dfd4878db518d5451d0
parent ab3c5646b41de1b6d95782371289db585ba8aa85
author Trevor Bramble <inbox@trevorbramble.com> 1372482098 -0700
committer Trevor Bramble <inbox@trevorbramble.com> 1372482214 -0700

add tmux by @seebi!

This is identical to the output of `git cat-file -p`, except for the `commit 248` at the beginning. That's a header that Git uses to tell different types of objects apart, and `248` is the content length of this particular commit. There's also a null byte after the content length that the shell is not displaying here, and this will become important when we write code to handle the header in a moment.

I'm done playing with the shell for now, and I want to write some code. The first thing I'd like to do is import some libraries and define helper functions for compresssion and decompression. Haskell's `zlib` library works with lazy bytestrings but I'd rather use strict bytestrings in the rest of my code, so I'll define `compress` and `decompress` accordingly.

In [9]:
import qualified Codec.Compression.Zlib as Z (compress, decompress)
import           Data.ByteString.Lazy        (fromStrict, toStrict)
import           Data.ByteString             (ByteString)
import qualified Data.ByteString        as B

compress, decompress :: ByteString -> ByteString
compress   = toStrict . Z.compress   . fromStrict
decompress = toStrict . Z.decompress . fromStrict

Now to recreate the `zlib-flate` output from earlier, and demonstrate the presence of that null byte in the header:

In [10]:
commit <- B.readFile ".git/objects/e4/0cd4130e2a82f9b03ada1ca378b7701b1a9110"
decompress commit

commit 248 tree ecd0e58d6832566540a30dfd4878db518d5451d0
parent ab3c5646b41de1b6d95782371289db585ba8aa85
author Trevor Bramble <inbox@trevorbramble.com> 1372482098 -0700
committer Trevor Bramble <inbox@trevorbramble.com> 1372482214 -0700

add tmux by @seebi!

Next, I want to make sense of this content, and by 'make sense of' I mean 'write a parser for'. Haskell has a couple of great options for this, and I've decided to go with `attoparsec`. It does the right thing and accounts for a parsing failure, but I'm pretty confident that my parsers won't fail, so I'll define a helper function that gets rid of the `Either` layer.

In [11]:
import           Data.Attoparsec.ByteString (Parser)
import qualified Data.Attoparsec.ByteString.Char8 as AC

parsed :: Parser a -> ByteString -> a
parsed parser = either error id . AC.parseOnly parser

In [12]:
parseHeader :: Parser (ByteString, Int)
parseHeader = do
    objectType <- AC.takeTill AC.isSpace
    AC.space
    len <- AC.decimal
    AC.char '\NUL'
    return (objectType, len)

commit <- decompress <$> B.readFile ".git/objects/e4/0cd4130e2a82f9b03ada1ca378b7701b1a9110"

parsed parseHeader commit

("commit",248)

In [13]:
type Ref = ByteString

parseHexRef :: Parser Ref
parseHexRef = AC.take 40

In [14]:
data Commit = Commit
    { commitTree      :: Ref
    , commitParents   :: [Ref]
    , commitAuthor    :: ByteString
    , commitCommitter :: ByteString
    , commitMessage   :: ByteString
    } deriving (Eq, Show)

parseCommit = do
    cTree      <-           AC.string "tree"      *> AC.space *> parseHexRef                   <* AC.endOfLine
    cParents   <- AC.many' (AC.string "parent"    *> AC.space *> parseHexRef                   <* AC.endOfLine)
    cAuthor    <-           AC.string "author"    *> AC.space *> AC.takeTill (AC.inClass "\n") <* AC.endOfLine
    cCommitter <-           AC.string "committer" *> AC.space *> AC.takeTill (AC.inClass "\n") <* AC.endOfLine
    AC.endOfLine
    cMessage   <- AC.takeByteString
    return $ Commit cTree cParents cAuthor cCommitter cMessage

parsed (parseHeader *> parseCommit) commit

Commit {commitTree = "ecd0e58d6832566540a30dfd4878db518d5451d0", commitParents = ["ab3c5646b41de1b6d95782371289db585ba8aa85"], commitAuthor = "Trevor Bramble <inbox@trevorbramble.com> 1372482098 -0700", commitCommitter = "Trevor Bramble <inbox@trevorbramble.com> 1372482214 -0700", commitMessage = "add tmux by @seebi!\n"}

In [15]:
import Data.Monoid ((<>), mappend, mconcat)
import Data.Byteable

instance Byteable Commit where
    toBytes (Commit cTree cParents cAuthor cCommitter cMessage) = mconcat
        [                        "tree "      <> cTree      <> "\n"
        , mconcat (map (\cRef -> "parent "    <> cRef       <> "\n") cParents)
        ,                        "author "    <> cAuthor    <> "\n"
        ,                        "committer " <> cCommitter <> "\n"
        ,                                                      "\n"
        ,                                        cMessage
        ]

parsedCommit = parsed (parseHeader *> parseCommit) commit
(parsed parseCommit . toBytes $ parsedCommit) == parsedCommit

True

In [16]:
import Data.ByteString.UTF8 (fromString, toString)

withHeader :: ByteString -> ByteString -> ByteString
withHeader oType content = mconcat [oType, " ", fromString . show $ B.length content, "\NUL", content]

withHeader "commit" (toBytes parsedCommit)

commit 248 tree ecd0e58d6832566540a30dfd4878db518d5451d0
parent ab3c5646b41de1b6d95782371289db585ba8aa85
author Trevor Bramble <inbox@trevorbramble.com> 1372482098 -0700
committer Trevor Bramble <inbox@trevorbramble.com> 1372482214 -0700

add tmux by @seebi!

In [17]:
import Data.Digest.Pure.SHA

hash :: ByteString -> Ref
hash = fromString . showDigest . sha1 . fromStrict

hash (withHeader "commit" (toBytes parsedCommit))

e40cd4130e2a82f9b03ada1ca378b7701b1a9110

In [18]:
:!git cat-file -p ecd0e58d6832566540a30dfd4878db518d5451d0

100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	.gitmodules
100644 blob ec00a76061539cf774614788270214499696f871	CHANGELOG.mkd
100644 blob f95aaf80007d225f00d3109987ee42ef2c2e0c0a	DEVELOPERS.mkd
100644 blob ee08d7e44f15108ef5359550399dad55955b56ca	LICENSE
100644 blob d18ee9450251ea1b9a02ebd4d6fce022df9eb5e4	README.md
040000 tree 1981c76881c6a14e14d067a44247acd1bf6bbc3a	adobe-swatches-solarized
040000 tree 825c732bdd3a62aeb543ca89026a26a2ee0fba26	apple-colorpalette-solarized
040000 tree 7bab2828df5de23262a821cc48fe0ccf8bd2a9ae	emacs-colors-solarized
040000 tree f5fe8c3e20b2577223f617683a52eac31c5c9f30	files
040000 tree 5b60111510dbb3d8560cf58a36a20a99fc175658	gedit
040000 tree 60c9df3d6e1994b76d72c061a02639af3d925655	gimp-palette-solarized
040000 tree 979cf43752e4d698c7b5b47cff665142a274c133	img
040000 tree 3ff6d431303b66cc50e45b6fabd72302f210aebc	intellij-colors-solarized
040000 tree 8f387a531ad08f146c86e4b6007b898064ad4d7f	iterm2-colors-solarized
040000 tree 1e37592e62c85909be4c5

In [19]:
decompress <$> B.readFile ".git/objects/ec/d0e58d6832566540a30dfd4878db518d5451d0"

tree 1282 100644 .gitmodules �⛲��CK�)�wZ���S�100644 CHANGELOG.mkd � �`aS��taG�'I���q100644 DEVELOPERS.mkd �Z�� }"_ ����B�,.
100644 LICENSE ���O��5�P9��U�[V�100644 README.md ю�EQ�������"ߞ��40000 adobe-swatches-solarized ��h�ơN�g�BG�ѿk�:40000 apple-colorpalette-solarized �\s+�:b��Cʉj&���&40000 emacs-colors-solarized {�((�]�2b�!�H�ϋҩ�40000 files ���> �Wr#�h:R��\�040000 gedit [`۳�V�6�
��VX40000 gimp-palette-solarized `��=n��mr�a�&9�=�VU40000 img ���7R�֘ǵ�|�fQB�t�340000 intellij-colors-solarized ?��10;f�P�[o��#���40000 iterm2-colors-solarized �8zSЏl�� {��d�M40000 mutt-colors-solarized 7Y.b�Y	�L^^�t�wvn�"40000 netbeans-colors-solarized �2�p@����:��j�.�D40000 osx-terminal.app-colors-solarized @�e�"���>�'7_��x40000 putty-colors-solarized cߦ�!O�vӟz"��S�
40000 qtcreator E9!�g��^@��s��`�V?>40000 seestyle-colors-solarized ]փ*2A���!��(��|�E�40000 textmate-colors-solarized <�>��}H��)�e���j40000 textwrangler-bbedit-colors-solarized M�R�jG��.w�Sx��K40000 tm

In [20]:
import Data.ByteString.Base16 (encode)

parseBinRef :: Parser Ref
parseBinRef = encode <$> AC.take 20

data Tree = Tree { treeEntries :: [TreeEntry] } deriving (Eq, Show)

data TreeEntry = TreeEntry
    { treeEntryPerms :: ByteString
    , treeEntryName  :: ByteString
    , treeEntryRef   :: Ref
    } deriving (Eq, Show)

parseTreeEntry :: Parser TreeEntry
parseTreeEntry = do
    perms <- fromString <$> AC.many1' AC.digit
    AC.space
    name  <- AC.takeWhile (/='\NUL')
    AC.char '\NUL'
    ref   <- parseBinRef
    return $ TreeEntry perms name ref

parseTree :: Parser Tree
parseTree = Tree <$> AC.many' parseTreeEntry

tree <- decompress <$> B.readFile ".git/objects/ec/d0e58d6832566540a30dfd4878db518d5451d0"

parsedTree = parsed (parseHeader *> parseTree) tree
parsedTree

Tree {treeEntries = [TreeEntry {treeEntryPerms = "100644", treeEntryName = ".gitmodules", treeEntryRef = "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391"},TreeEntry {treeEntryPerms = "100644", treeEntryName = "CHANGELOG.mkd", treeEntryRef = "ec00a76061539cf774614788270214499696f871"},TreeEntry {treeEntryPerms = "100644", treeEntryName = "DEVELOPERS.mkd", treeEntryRef = "f95aaf80007d225f00d3109987ee42ef2c2e0c0a"},TreeEntry {treeEntryPerms = "100644", treeEntryName = "LICENSE", treeEntryRef = "ee08d7e44f15108ef5359550399dad55955b56ca"},TreeEntry {treeEntryPerms = "100644", treeEntryName = "README.md", treeEntryRef = "d18ee9450251ea1b9a02ebd4d6fce022df9eb5e4"},TreeEntry {treeEntryPerms = "40000", treeEntryName = "adobe-swatches-solarized", treeEntryRef = "1981c76881c6a14e14d067a44247acd1bf6bbc3a"},TreeEntry {treeEntryPerms = "40000", treeEntryName = "apple-colorpalette-solarized", treeEntryRef = "825c732bdd3a62aeb543ca89026a26a2ee0fba26"},TreeEntry {treeEntryPerms = "40000", treeEntryName = "em

In [21]:
import Data.ByteString.Base16 (decode)

instance Byteable TreeEntry where
    toBytes (TreeEntry perms name ref) = mconcat [perms, " ", name, "\NUL", fst $ decode ref]

instance Byteable Tree where
    toBytes (Tree entries) = mconcat (map toBytes entries)

(parsed parseTree . toBytes $ parsedTree) == parsedTree

True

In [22]:
:!git cat-file -p ec00a76061539cf774614788270214499696f871

Solarized Changelog

## Current release 1.0.0beta2

1.0.0beta2
----------

### Summary

Switch to the alternative red hue (final and only hue change), included a whole
heap of new ports and updates to the existing Vim colorscheme. The list of all 
currently included ports, highlighted items are new, updates noted:

#### Editors & IDEs

*   \[UPDATED\] **Vim**
*   \[NEW\] ***Emacs***
*   \[NEW\] ***IntelliJ IDEA***
*   \[NEW\] ***NetBeans***
*   \[NEW\] ***SeeStyle theme for Coda & SubEthaEdit***
*   \[NEW\] ***TextMate***
*   \[NEW\] ***Visual Studio***

#### Terminal Emulators

* \[UPDATED\] **iTerm2 colorschemes**
* \[UPDATED\] **OS X Terminal.app colors**
* \[UPDATED\] **Xresources colors**

#### Other Applications

* \[UPDATED\] **Mutt mail client colorschemes**

#### Palettes

* \[UPDATED\] **Adobe Photoshop Swatches**
* \[UPDATED\] **Apple Color Picker Palette**
* \[UPDATED\] **Gimp Palette**


### Critical Changes

These changes may require you to change your configuration.

*  

In [23]:
blob <- decompress <$> B.readFile ".git/objects/ec/00a76061539cf774614788270214499696f871"
blob

blob 5549 Solarized Changelog

## Current release 1.0.0beta2

1.0.0beta2
----------

### Summary

Switch to the alternative red hue (final and only hue change), included a whole
heap of new ports and updates to the existing Vim colorscheme. The list of all 
currently included ports, highlighted items are new, updates noted:

#### Editors & IDEs

*   \[UPDATED\] **Vim**
*   \[NEW\] ***Emacs***
*   \[NEW\] ***IntelliJ IDEA***
*   \[NEW\] ***NetBeans***
*   \[NEW\] ***SeeStyle theme for Coda & SubEthaEdit***
*   \[NEW\] ***TextMate***
*   \[NEW\] ***Visual Studio***

#### Terminal Emulators

* \[UPDATED\] **iTerm2 colorschemes**
* \[UPDATED\] **OS X Terminal.app colors**
* \[UPDATED\] **Xresources colors**

#### Other Applications

* \[UPDATED\] **Mutt mail client colorschemes**

#### Palettes

* \[UPDATED\] **Adobe Photoshop Swatches**
* \[UPDATED\] **Apple Color Picker Palette**
* \[UPDATED\] **Gimp Palette**


### Critical Changes

These changes may require you to change your configura

In [24]:
data Blob = Blob { blobContent :: ByteString } deriving (Eq, Show)

parseBlob :: Parser Blob
parseBlob = Blob <$> AC.takeByteString

parsedBlob = parsed (parseHeader *> parseBlob) blob
parsedBlob



In [25]:
instance Byteable Blob where
    toBytes (Blob content) = content

(parsed parseBlob . toBytes $ parsedBlob) == parsedBlob

True

In [26]:
:!git show-ref --tags

31ff7f5064824d2231648119feb6dfda1a3c89f5 refs/tags/v1.0.0beta1
a3037b428f29f0c032aeeeedb4758501bc32444d refs/tags/v1.0beta

In [27]:
:!git cat-file -p 31ff7f5064824d2231648119feb6dfda1a3c89f5

object 90581c7bfbcd279768580eec595d0ab3c094cc02
type commit
tag v1.0.0beta1
tagger Ethan Schoonover <es@ethanschoonover.com> 1300994142 -0700

Initial public beta release 1.0.0beta1

In [28]:
tag <- decompress <$> B.readFile ".git/objects/31/ff7f5064824d2231648119feb6dfda1a3c89f5"
tag

tag 182 object 90581c7bfbcd279768580eec595d0ab3c094cc02
type commit
tag v1.0.0beta1
tagger Ethan Schoonover <es@ethanschoonover.com> 1300994142 -0700

Initial public beta release 1.0.0beta1

In [29]:
data Tag = Tag
    { tagObject     :: Ref
    , tagType       :: ByteString
    , tagTag        :: ByteString
    , tagTagger     :: ByteString
    , tagAnnotation :: ByteString
    } deriving (Eq, Show)

parseTag :: Parser Tag
parseTag = do
    tObject     <- AC.string "object" *> AC.space *> parseHexRef                                                 <* AC.endOfLine
    tType       <- AC.string "type"   *> AC.space *> AC.choice (map AC.string ["commit", "tree", "blob", "tag"]) <* AC.endOfLine
    tTag        <- AC.string "tag"    *> AC.space *> AC.takeTill (AC.inClass "\n")                               <* AC.endOfLine
    tTagger     <- AC.string "tagger" *> AC.space *> AC.takeTill (AC.inClass "\n")                               <* AC.endOfLine
    AC.endOfLine
    tAnnotation <- AC.takeByteString
    return $ Tag tObject tType tTag tTagger tAnnotation

parsedTag = parsed (parseHeader *> parseTag) tag
parsedTag

Tag {tagObject = "90581c7bfbcd279768580eec595d0ab3c094cc02", tagType = "commit", tagTag = "v1.0.0beta1", tagTagger = "Ethan Schoonover <es@ethanschoonover.com> 1300994142 -0700", tagAnnotation = "Initial public beta release 1.0.0beta1\n"}

In [30]:
instance Byteable Tag where
    toBytes (Tag tObject tType tTag tTagger tAnnotation) = mconcat
        [ "object " <> tObject     <> "\n"
        , "type "   <> tType       <> "\n"
        , "tag "    <> tTag        <> "\n"
        , "tagger " <> tTagger     <> "\n"
        ,                             "\n"
        ,              tAnnotation
        ]

(parsed parseTag . toBytes $ parsedTag) == parsedTag

True

In [31]:
data GitObject
    = GitCommit Commit
    | GitTree   Tree
    | GitBlob   Blob
    | GitTag    Tag
    deriving (Eq, Show)

parseGitObject :: Parser GitObject
parseGitObject = do
    headerLen <- parseHeader
    case (fst headerLen) of
        "commit" -> GitCommit <$> parseCommit
        "tree"   -> GitTree   <$> parseTree
        "blob"   -> GitBlob   <$> parseBlob
        "tag"    -> GitTag    <$> parseTag
        _        -> error "not a git object"

instance Byteable GitObject where
    toBytes obj = case obj of
        GitCommit c -> withHeader "commit" (toBytes c)
        GitTree   t -> withHeader "tree"   (toBytes t)
        GitBlob   b -> withHeader "blob"   (toBytes b)
        GitTag    t -> withHeader "tag"    (toBytes t)

hashObject :: GitObject -> Ref
hashObject = hash . toBytes

In [32]:
hashObject . parsed parseGitObject <$> (decompress <$> B.readFile ".git/objects/31/ff7f5064824d2231648119feb6dfda1a3c89f5")

31ff7f5064824d2231648119feb6dfda1a3c89f5

In [33]:
import System.FilePath ((</>))

refPath :: FilePath -> Ref -> FilePath
refPath gitDir ref = let
   (dir,file) = splitAt 2 (toString ref)
   in gitDir </> "objects" </> dir </> file

refPath ".git" "31ff7f5064824d2231648119feb6dfda1a3c89f5"

".git/objects/31/ff7f5064824d2231648119feb6dfda1a3c89f5"

In [34]:
readObject :: FilePath -> Ref -> IO GitObject
readObject gitDir ref = do
    let path = refPath gitDir ref
    content <- decompress <$> B.readFile path
    return $ parsed parseGitObject content

readObject ".git" "31ff7f5064824d2231648119feb6dfda1a3c89f5"

GitTag (Tag {tagObject = "90581c7bfbcd279768580eec595d0ab3c094cc02", tagType = "commit", tagTag = "v1.0.0beta1", tagTagger = "Ethan Schoonover <es@ethanschoonover.com> 1300994142 -0700", tagAnnotation = "Initial public beta release 1.0.0beta1\n"})

In [35]:
import System.Directory (doesPathExist, createDirectoryIfMissing)
import System.FilePath  (takeDirectory)
import Control.Monad    (when, unless)

writeObject :: FilePath -> GitObject -> IO Ref
writeObject gitDir object = do
    let ref  =  hashObject object
    let path =  refPath gitDir ref
    exists   <- doesPathExist path
    unless exists $ do
        let dir = takeDirectory path
        createDirectoryIfMissing True dir
        B.writeFile path . compress $ toBytes object
    return ref

In [36]:
import Control.Monad    (forM)
import System.Directory (listDirectory)

allRefs <- do
    prefixes <- filter (\d -> length d == 2) <$> listDirectory ".git/objects/"
    concat <$> forM prefixes (\p ->
        map (fromString . (p++)) <$> listDirectory (".git/objects" </> p))

test <- forM allRefs $ \ref -> do
    obj  <- readObject  ".git" ref
    ref' <- writeObject ".git" obj
    return $ ref == ref'

and test

True