Moving from one SVN repo to 415 Git repos

At Old Town Media, we have used a single subversion repo with folders for each site that we worked on for the last 9 or so years. This changed two months ago when we finally switched to Git hosted on BitBucket with a single repository for each and every one of our sites. This has been a huge boon for our workflow and opens up a lot of opportunity for a git-based local environment workflow and automated testing using a service like Jenkins or dploy.io.

Why We Switched

Our old system worked. It was also easy for new people to step into using the very simple Versions app. It also sucked. Versions is an extremely buggy app that throws a fit almost every time you so much as move a file around. Using a single repository for all of our sites meant we couldn’t track our changes effectively or tag them using our project management software. It left our entire system vulnerable to one accidental deletion and on top of that, the SVN repo we had was hosted in our office so there was too little physical separation between an disaster for our checked out copies and the repo copies.

On top of this converting to git and having a single repository for each site opened up a world of possibilities for our workflow. We could more easily tag changes, we could (finally) run branches for different development stages (something Versions is incapable of), we could push our repos through an automated testing service and even deploy using these same services. It would make local development easier because most deployment services offer git integrations, but nothing for SVN and we could push much more easily form our local copy.

In other words, it was a no-brainer to switch to git and actually host our repos correctly.

Who?

The hard part came when I started looking into which service to use and how to switch from SVN to git, hopefully with our full commit histories. There are a LOT of git hosting services, but we needed private repos because almost everything that we would host on this service would be client sites – and it’s not kosher to broadcast that code out publicly. We played with a self-hosted service but that ended up being a disaster to get permissions set up properly and we really have little desire to play sysadmin to our repos. In the end it same down to the 2 biggest services out there – Github and Bitbucket. Github has a fantastic UI and a decent Mac app and was definitely my 1st choice until I saw the pricing for private repos. Github ain’t cheap, kids. So, to Bitbucket we went. The UI is very clunky but their support and API are fantastic and the pricing is extremely reasonable – it’s based on number of users instead of number of repos – basically built for a small agency that pumps out a lot of sites.

How?

Once we finally settled on a git hosting service for the project, we had to figure out how to break out almost 415 folders in a single SVN repository into 415 individual repositories in git without losing our commit history. There are a lot of libraries to accomplish a straight one-to-one conversion but not a single one to convert a single mega-repo into proper individual ones. In the end, we settled on keeping the SVN repo alive in case something went wrong with the move up and individually converting the folders into their own repos without a history to fall back on in the new system.

We had 415 folders – almost all of which were named as URLs and the total size was around 40GB without the SVN files and there were ~ 286,000 individual files. Everything from PSD files to images to readmes for libraries.

Step 1: Download the entire SVN repo

This took a crazy long time – around 3 hours of just chugging through and downloading every possible file into a different folder on my desktop. The good thing about the way we organized the repos in the past is that all of the main folders were at the same hierarchy in the repository – meaning we could simply loop through all of the folder names and run our tasks based on that in the next steps.

Step 2: Make an automator action to spit out an array of all of the folder names

I needed an array that I could use in PHP (also could have been bash, js, etc) to loop through using BitBucket’s fantastic API to initially. Miles built me an Automator action to loop through the folder names and spit out a text file with all of the URLs – and non URLs which we compared and cleaned up the folder names before the import. In the screenshot below you can see the action.

automator

 

When I actually ran my repo creation import in the next step, I clicked on “Results” in the “Get Folder Contents” section which allows you to interrupt the flow and pull the results out as an array.

Step 3: Create Bitbucket repos via the API

Next, we need to create all of our repos using the array that we just got from Automator.

As you can see above, for each of the array items we call on the API, create our repository and print out the returned result for verification. Now, I made a mistake at this stage because I assumed that BitBucket wouldn’t parse the URL for the ‘website’ field on entry and would simply accept anything I put in there. Well, they were smarter than me and they did so I had to go back through and create about 10-20 repos again without the website field.

Step 4: Clone all of the new BitBucket repos to my computer

This is where the really nerdy cool work comes in. In bash, I looped through a slightly modified array of names (taken from the Automator step) and checked out the git repo for every single one of them in a sibling folder adjacent to my SVN checkout.

for i in "${sites[@]}"
do
git clone git@bitbucket.org:oldtownmedia/$i.git
done

Step 5: diff the SVN & Git folder

At this stage I had two folders on our computer. One with 415 full folders & SVN files in it and one with 415 empty folders and git files in it. Now is when I deleted all of the old SVN files with “rm -rf .git”. This shaved about 20GB off the total folder size.

To copy files from one folder to another without overriding any duplicates, Mac provides a simple command called “ditto”. With ditto, you feed it the path of the folder that you want to copy from and the path of the folder that you want to copy to and it will do all of the diff work and copying of the files and it’s damn quick. ie:

ditto ~/Documents/MyFolder ~/Desktop/MyNewFolder

Now, you have two identical sets of folders, with the exception that one has git files for each repo and is ready to stage to push.

Step 6: Push it. Push it real good.

Just like I looped through the site array to pull down all of my repos, I looped through the individual folders and committed/pushed everything at once. This took about 5 hours so I set my computer down for the night and took a look through everything in the morning.

for i in "${sites[@]}"
do
cd $i
git add .
git commit -m "Initial git commit"
git push origin master
cd ../
done

And just like that, we had 415 individual, properly named git repos to use. Now, let’s add up the total time it took to run everything:

Not bad for a move to a completely new versioning system and breaking up a mega-repo into individual usable ones!