2009-05-09

Converting confused SVN repositories into Git repositories

I've been spending this evening converting the repositories of my old projects from SVN to Git. I used to have the repositories hosted on my home server, but now I've moved them to GitHub (see my GitHub profile). Here I have outlined the procedures that I used to convert my source code repositories.

Preparations

First I installed svn2git, because it handles tags and branches much better than the basic git svn clone command. I run Git under Cygwin, so first I had to install the ruby package using Cygwin Setup. And since Cygwin's Ruby does not come with RubyGems, I downloaded and installed it manually using these instructions.

When RubyGems was installed, I was able to type the following commands to finally install svn2git from GitHub:

gem sources -a http://gems.github.com
gem install nirvdrum-svn2git

Most of my SVN repositories were already running on my server, so accessing them was easy. But for some projects I had just a tarballed version of the repository. For those it was best to run svnserve locally, because git-svn is was not able to connect to a SVN repository through the file system. So I unpacked the repository tarballs into a directory X (so that the individual repositories are subdirectories of X), after which I started svnserve with the command "svnserve --daemon --foreground --root X". Then I could access the repositories through "svn://localhost/name-of-repo" URLs.

You will also need to write an authors file which lists all usernames in the SVN repositories and what their corresponding Git author names should be. The format is as follows, one user per line:

loginname = Joe User <user@example.com>

I placed the authors.txt file into my working directory, where I could easily point to it when doing the conversions.

Simple conversions

When the SVN repository uses the standard layout and its version history does not have anything weird happening, then the following commands could be used to convert the repository.

First make an empty directory and use svn2git to clone the SVN repository:

mkdir name-of-repo
cd name-of-repo
svn2git svn://localhost/name-of-repo --authors ../authors.txt --verbose

When that is finished, check that all branches, tags and version history were imported correctly:

git branch
git tag
gitk --all

You will probably want to publish the repository, so create a new repository (in this example I use GitHub) and push your repository there. Remember to include all branches and tags:

git remote add origin git@github.com:username/git-repo-name.git
git push --all
git push --tags

After that you better clone the published repository from the central server, the way you normally do (cd /my/projects ; git clone git@github.com:username/git-repo-name.git), and delete the original repository which was used when importing from SVN, to get rid of all the SVN related files in the .git directory.

You might also want to add .gitignore file into your project. For my projects I use the following to keep Maven's build artifacts and IntelliJ IDEA's workspace file out of version control:

/*.iws
/target/
/*/target/

Complex conversions

I had one SVN repository where the repository layout had been changed in the middle of the project. At first all project files had been in the root of the repository ("/"), after which they had been moved into /trunk. This caused that when I imported the SVN repository using the standard layout options, the history stopped where that move was made, because before that point in history there was no /trunk. I wanted to import a clean history, so that this mess would not be reflected in the resulting Git repository's history.

What I did, was that first I imported the latter part of the history which used the standard layout:

mkdir messy-repo.2
cd messy-repo.2
svn2git svn://localhost/messy-repo/trunk --rootistrunk --authors ../authors.txt --verbose

Then I imported the first part of the history which used the trunkless layout. This also includes the latter part of the history, but with all files moved under a /trunk directory:

mkdir messy-repo.1
cd messy-repo.1
svn2git svn://localhost/messy-repo --rootistrunk --authors ../authors.txt --verbose

Then I created a new repository where I would be combining the history from those two repositories. I cloned it from the repository with the first part of the clean history.

git clone file:///tmp/svn2git/messy-repo.1/.git messy-repo.combined
cd messy-repo.combined

Then I would start a branch "old_master" from the current master, just to be sure not to lose it. I would also make a tag "after_mess" for the commit that changed the SVN repository layout, and a tag "before_mess" for the commit just before that, where all project files were still cleanly in the repository root.

Did I mention, that the layout changing commit did also add one file, in addition to just changing the repository layout? So I had to recover that change from the otherwise pointless commit. First I had do get a patch with the desirable changes. So I hand-copied from SVN the desired file, checked out the version in Git just before the mess, made the desired change to the working copy, committed it and tagged it so that it would not be lost.

cd messy-repo.combined
git checkout before_mess
git add path/to/the/DesiredFile.java
git commit -m "Recovered the desired file from the mess"
git tag desired_changes

Then I would make a patch with just that once change:

git format-patch -M -C -k -1 desired_changes

Which then created the file 0001-desired-changes.patch.

I needed also clean patches for the latter part of the version history. So I created patches for all changes in the  messy-repo.2 repository.

cd messy-repo.2
git format-patch -M -C -k --root master

Then I would hand-edit the 0001-desired-changes.patch file to contain the same date and time as the original commit that messed up the repo. I would also remove the patch for that commit from the patches produced by messy-repo.2.

Then it was time to merge the patches into the first part of the history:

cd messy-repo.combined
git checkout before_mess
git am -k 0001-desired-changes.patch
git am -k patches-from-repo-2/00*
git branch fixed_master
git checkout fixed_master

That way all the history was saved, even the author dates were unchanged (commit dates did however change to current time when using patches - it's possible to rewrite the commit dates using git filter-branch). After that I could just clean up the branches and push it to the central repository as normally.

1 comment: