MDLog:/sysadmin

The Journal Of A Linux Sysadmin

HowTo Completely Remove a File From Git History

| Comments

I just started working on a new project and as you would expect one of the first things I did was to download its git repository from github. These were just some scripts and should have been very small ~5M, but the clone from gitbhub took about one hour as the full repo folder was 1.5G… (with the biggest size under .git/objects/pack) Crazy… What was in the git repository history that would cause something like this? I assumed that at some point in time the repository was much bigger (probably from some file/s that don’t exist anymore), but how could I find out what were those files? And more important howto remove them from history? Well if you came here from a google search on “how to remove a file from git history” then you probably know there are plenty of docs and howtos on how to achieve this but from my experience none of them really worked. This is why I decided to document the steps needed to identify the file from the git repo history that is using all that space and to have it removed fully and bring the repository to a manageable size.

First we need to identify the file that is causing this issue; and for this we will verify all the packed objects and look for the biggest ones:

1
git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -5

(and grab the revisions with the biggest files). Then find the name of the files in those revisions:

1
git rev-list --objects --all | grep <revision_id>

Next, remove the file from all revisions:

1
2
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <filename>'
rm -rf .git/refs/original/

Edit .git/packed-refs and remove/comment any external pack-refs. Without this the cleanup might not work. I my case I had refs/remotes/origin/master and some others branches.

1
vim .git/packed-refs

Finally repack and cleanup and remove those objects:

1
2
3
git reflog expire --all --expire-unreachable=0
git repack -A -d
git prune

Hopefully these steps will help you completely remove those un-wanted files from your git history. Let me know if you have any problems after following these simple steps.

Note: if you want to test these steps here is how to quickly create a test repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Make a small repo
mkdir test
cd test
git init
echo hi > there
git add there
git commit -m 'Small repo'
# Add a random 10M binary file
dd if=/dev/urandom of=testme.txt count=10 bs=1M
git add testme.txt
git commit -m 'Add big binary file'
# Remove the 10M binary file
git rm testme.txt
git commit -m 'Remove big binary file'

Comments