Remove files from Git history using git-filter-repo
Marco Franssen /
7 min read • 1399 words
Many of you have probably been in a situation where you committed a file in your repository which you shouldn't have done in the first place. For example a file with credentials or a crazy big file that made your repository clones very slow. Now there are a lot of blogs and guides already available on how to get these files completely removed. It involves git filter-branch or bfg sourcery. In this blog I'm going to show you the new recommended way of doing this using git-filter-repo, which simplifies the process a lot.
Recently I had to rewrite a repository that slowed down the CI pipeline due to some huge movies committed. In later commits those movies where removed again, but in Git they still exist and cause big slow repositories. In the past I have also committed some credential that I had to remove. Trust me, we have all been there. To get the files removed, I started with the git filter-branch approach I am used to for years. Then I noticed following message.
Although the Git cli is giving us this warning there is not so much written on git-filter-repo yet. Also the documentation on Github is still referring to filter-branch. A reason for me to write and bring awareness of this tool.
After reading a bit on git-filter-repo I figured it is there for trivial rewriting usecases like removing a file entirely from history. This simplifies how you can remove a file entirely from your repository as you will have a simpler command at your fingertips as well you won't have to run things like BFG for the final cleanups on the remote. For other usecases you can still use the more advanced features of git filter-branch, but for simple rewrites git-filter-repo is the recommended way of handling these kind of things today.
So how do we get started? We will first have to install git-filter-repo. This is as simple as running your package manager install command according to the documentation. Below how I did this on my Macbook utilizing Homebrew.
Now with the tooling installed I simply started over my process of removing the files entirely from the repository. But before showing this specific step in the process I want to guide you through the process from the beginning.
I'm not responsible for any mistakes or loss of data when performing this process on your own. To become a master at any skill, it takes the total effort of your: heart, mind, and soul.
We start by cloning the repository using the --mirror option which takes care of pulling all branch information etc. locally.
Cloning via --mirror does not give you a workspace to work locally.
Next up I want to find the files that have been deleted in the past as I don't remember all of them. With the following command I can easily find which files have been removed in previous commits.
What we can see from above output is that I removed a .env file that apperantly contained some credentials, and I have removed some processed video's from the repository. There was also a file renamed, but that is all fine. So now we know which files to remove we can start utilizing git filter-repo.
Now we have fully rewritten our repository and got rid of the files we accidentally committed. Now there are 2 steps left. Pushing the repository and informing your teammember to make a fresh clone of the repository.
Yes this time it will be fast as the big files are gone!
You can ignore the deny updating a hidden ref messages. Those are related to Github Pull Requests. If you check the repository on Github you will figure the new history is on Github and when cloning you will notice it clones much faster due to the huge files being removed entirely.
To make things a bit easier I have also added an alias to my .gitconfig which allows me to easily find removed files from my Git history using a simpler commandline that is easier to remember.
This allows me to type git deleted as opposed to git log --diff-filter=D --summary.
You can find my entire .gitconfig, containing a whole bunch of aliases in my dotfiles Github repository. These aliases save me a lot of typing and speed up my development process.
It has been a while since I wrote a blog on Go. Since I'm getting the question if Go supports enums every now and then, I thought it would be good to write an article on how to do enums in Go.
Go natively does NOT have an enum type like you might be used to from c# or Java. However that doesn't mean we can easily define our own type.
In this blog we will cover defining our own type, combined with a piece of code generation. If you are new to Go, then consider reading Start on your first Go pro…
For the folks reading my blog for a long time, you might have noticed I'm using my current theme and blogging engine for a long time. About 5 years ago I migrated from Wordpress to Hexo. Wordpress at that point in time was costing me serious money to get a decent performing webpage according to modern standards. So back then I decided to move into a statically generated blog, where I could write my blogs offline using markdown.
Hexo has served me very well the last couple of years. It is a stat…
In this blog I want to show you a nice new feature in Nginx 1.19 Docker image. I requested it somewhere 2 years ago when I was trying to figure out how I could configure my static page applications more flexibly with various endpoints to backing microservices. Back then I used to have my static pages fetch a json file that contained the endpoints for the apis. This way I could simply mount this json file into my container with all kind of endpoints for this particular deployment. It was some sor…
In a previous blog I have written on setting up Elasticsearch in docker-compose.yml already. I have also shown you before how to setup Traefik 1.7 in docker-compose.yml. Today I want to show you how we can use Traefik to expose a loadbalanced endpoint on top of a Elasticsearch cluster.
Simplify networking complexity while designing, deploying, and running
We will setup our cluster using docker-compose so we can easily run and cleanup this cluster from our laptop.
Create a Elasti…