I just started working on a new project and as you would expect one of the first things I did was to download its git repository from github. These were just some scripts and should have been very small ~5M, but the clone from gitbhub took about one hour as the full repo folder was 1.5G… (with the biggest size under .git/objects/pack) Crazy… What was in the git repository history that would cause something like this? I assumed that at some point in time the repository was much bigger (probably from some file/s that don’t exist anymore), but how could I find out what were those files? And more important howto remove them from history? Well if you came here from a google search on “how to remove a file from git history” then you probably know there are plenty of docs and howtos on how to achieve this but from my experience none of them really worked. This is why I decided to document the steps needed to identify the file from the git repo history that is using all that space and to have it removed fully and bring the repository to a manageable size.
First we need to identify the file that is causing this issue; and for this we will verify all the packed objects and look for the biggest ones:
(and grab the revisions with the biggest files). Then find the name of the files in those revisions:
Next, remove the file from all revisions:
Edit .git/packed-refs and remove/comment any external pack-refs. Without this the cleanup might not work. I my case I had refs/remotes/origin/master and some others branches.
Finally repack and cleanup and remove those objects:
1 2 3
Hopefully these steps will help you completely remove those un-wanted files from your git history. Let me know if you have any problems after following these simple steps.
Note: if you want to test these steps here is how to quickly create a test repo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14