Tidy rewritten histories with Git

I imported some of my old projects from CVS to Git. I had the CVS repository of an old student project as a tarball. That one repository contained the sources of two programs - the main project and one small utility. I was able to import them into two separate Git repositories and also rewrite their version history so, that it would seem as if the utility program had always been a separate project and been using Maven (neither of which was true).

Importing the CVS repository to Git did not succeed with git cvsimport (it failed with "fatal error - cmalloc would have returned NULL"), but cvs2git worked and it was also orders of magnitude faster. It was necessary to edit the example options file provided with cvs2git - the CVS repository path and author names had to be configured. If some of the authors have non-ascii characters in their names, it's best to save the options file in UTF-8 format and use the u'Námè' format for the author names. See cvs2git's usage instructions for details on how to do the conversion.

Now that I had a Git repository with the history of both of the programs, it was time to separate the utility program's version history with git filter-branch (the main project's history did not need to be modified). It's best to take a temporary clone of the original repository before messing with filter-branch. That way it's easier to revert all changes and try again by just deleting and recreating the temporary repository.

I made a clone of the repository and in that clone I used --subdirectory-filter to remove everything else except the source codes of the utility program:

git filter-branch --subdirectory-filter src/hourparser -- --all

Originally it did not use Maven, but I wanted to modify the history to look like it had always used Maven. So then I used --tree-filter to move all the source files to the right directory structure. I also remove the manifest file, because Maven will generate it automatically. When removing files, it's best to use --prune-empty, or you may have problems for example during rebasing (I learned it the hard way). Also make sure that the last command in the filter will aways exit successfully with error code 0, or otherwise the whole filtering process will fail.

git filter-branch --prune-empty --tree-filter '
mkdir -p src/main/java/hourparser
mv *.java src/main/java/hourparser
rm -rf META-INF
' -- --all

After that was done, I had to insert the pom.xml and other Maven files to the version history. That I was able to do by making multiple commits with the initial project files and all the version number incrementing changes to them (the version number in pom.xml needs to be changed when a release is made) so that those commits were last in the history. Then I used git rebase to reorder the commits, so that the changes to pom.xml would be in the right places in the history. Changing the initial commit was more complicated, but I was able to do it by creating a new repository with that initial commit, and then rebasing the rest of the history from the other repository on top of it.

After this I had the right commits in place, but their dates were not consistent. The commits for the Maven files were dated in 2009, but everything else was dated 2005. That I was able to fix by exporting the repository into patches, editing the authors and author dates in the patches with a text editor, and finally importing the patches into a blank repository. Temporary patches are a powerful tool in editing the history.

git format-patch -M -C -k --root master
[edit the patches and move them to a new directory]
git init
git am -k 00*

After all this the authors and author dates were fine, but the committer and commit date information still needed fixing. I was able to change the committers to be the same as the author with the following command:

git filter-branch -f --env-filter '
' -- --all

After this I could publish Git repositories of the main project and the utility project with nice clean histories.

No comments:

Post a Comment