Much ado about scripting, Linux & Eclipse: card subject to change

Showing posts with label dvcs. Show all posts
Showing posts with label dvcs. Show all posts

2011-01-27

HOWTO: partially clone an SVN repo to Git, and work with branches

Skip to the code

I've blogged a few times now about Git (which I pronounce with a hard 'g' a la "get", as it's supposed to be named for Linus Torvalds, a self-described git, but which I've also heard called pronounced with a soft 'g' like "jet"). Either way, I'm finding it way more efficient and less painful than either CVS or SVN combined.

So, to continue this series ([1], [2], [3]), here is how (and why) to pull an SVN repo down as a Git repo, but with the omission of old (irrelevant) revisions and branches.

Using SVN for SVN repos

In days of yore when working with the JBoss Tools and JBoss Developer Studio SVN repos, I would keep a copy of everything in trunk on disk, plus the current active branch (most recent milestone or stable branch maintenance). With all the SVN metadata, this would eat up substantial amounts of disk space but still require network access to pull any old history of files. The two repos were about 2G of space on disk, for each branch. Sure, there's tooling to be able to diff and merge between branches w/o having both branches physically checked out, but nothing beats the ability to place two folders side by side OFFLINE for deep comparisons. So, at times, I would burn as much as 6-8G of disk simply to have a few branches of source for comparison and merging. With my painfullly slow IDE drive, this would grind my machine to a halt, especially when doing any SVN operation or counting files / disk usage.

Using Git for SVN repos naively

Recently, I started using git-svn to pull the whole JBDS repo into a local Git repo, but it was slow to create and still unwieldy. And the JBoss Tools repo was too large to even create as a Git repo - the operation would run out of memory while processing old revisions of code to play forward.

At this point, I was stuck having individual Git repos for each JBoss Tools component (major source folder) in SVN: archives, as, birt, bpel, build, etc. It worked, but replicating it when I needed to create a matching repo-collection for a branch was painful and time-consuming. As well, all the old revision information was eating even more disk than before:

  • jbosstools' trunk as multiple git-svn clones: 6.1G
  • devstudio's trunk as single git-svn clone: 1.3G

So, now, instead of a couple Gb per branch, I was at nearly 4x as much disk usage. But at least I could work offline and not deal w/ network-intense activity just to check history or commit a change. Still, far from ideal.

Cloning SVN with standard layout & partial history

This past week, I discovered two ways to make the git-svn experience at least an order of magnitude better:

  1. Standard layout (-s) - this allows your generated Git repo to contain the usual trunk, branches/* and tags/* layout that's present in the source SVN repo. This is a win because it means your repo will contain the branch information so you can easily switch between branches within the same repo on disk. No more remote network access needed!
  2. Revision filter (-r) - this allows your generated Git repo to start from a known revision number instead of starting at its birth. Now instead of taking hours to generate, you can get a repo in minutes by excluding irrelevant (ancient) revisions.

So, why is this cool? Because now, instead of having 2G of source+metadata to copy when I want to do a local comparison between branches, the size on disk is merely:

  • jbosstools' trunk as single git-svn clone w/ trunk and single branch: 1.3G
  • devstudio's trunk as single git-svn clone w/ trunk and single branch: 0.13G

So, not only is the footprint smaller, but the performance is better and I need never do a full clone (or svn checkout) again - instead, I can just copy the existing Git repo, and rebase it to a different branch. Instead of hours, this operation takes seconds (or minutes) and happens without the need for a network connection.


Okay, enough blather. Show me the code!

Check out the repo, including only the trunk & most recent branch

# Figure out the revision number based on when a branch was created, then 
# from r28571, returns -r28571:HEAD
rev=$(svn log --stop-on-copy \
  http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x \
  | egrep "r[0-9]+" | tail -1 | sed -e "s#\(r[0-9]\+\).\+#-\1:HEAD#")

# now, fetch repo starting from the branch's initial commit
git svn clone -s $rev http://svn.jboss.org/repos/jbosstools jbosstools_GIT

Now you have a repo which contains trunk & a single branch

git branch -a # list local (Git) and remote (SVN) branches

  * master
    remotes/jbosstools-3.2.x
    remotes/trunk

Switch to the branch

git checkout -b local/jbosstools-3.2.x jbosstools-3.2.x # connect a new local branch to remote one

  Checking out files: 100% (609/609), done.
  Switched to a new branch 'local/jbosstools-3.2.x'

git svn info # verify now working in branch

  URL: http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x
  Repository Root: http://svn.jboss.org/repos/jbosstools

Switch back to trunk

git checkout -b local/trunk trunk # connect a new local branch to remote trunk

  Switched to a new branch 'local/trunk'

git svn info # verify now working in branch

  URL: http://svn.jboss.org/repos/jbosstools/trunk
  Repository Root: http://svn.jboss.org/repos/jbosstools

Rewind your changes, pull updates from SVN repo, apply your changes; won't work if you have local uncommitted changes

git svn rebase

Fetch updates from SVN repo (ignoring local changes?)

git svn fetch

Create a new branch (remotely with SVN)

svn copy \
  http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x \
  http://svn.jboss.org/repos/jbosstools/branches/some-new-branch

2009-05-04

Git 'Er Done

The discussions in bug 257706: Host a git repository on Eclipse Foundation servers, support git as the repository of Eclipse projects rages on, wind blowing in both directions.

Image

Let's look at the objections to Git @ Eclipse:

Implementing a common build infrastructure would also be complicated by additional code repositories as well.

Not true; the Athena system already supports CVS and SVN, plus a "build from local sources" mode which works w/ a cvs/svn tree dump, a workspace w/ checked out projects, or (TBD, we haven't tested this yet) with a git repo. And we have an open bug to make repo tree structure irrelevant to the local checkout mode. Party on. There is even a Git plugin for Hudson so you can use Hudson to watch your repo for changes, like it does with CVS and SVN.

Can't use unapproved or non-EPL code at Eclipse.org

Not true; from discussions w/ [email protected], I've been told at least twice that as long as you're not SHIPPING code that falls under a non-EPL or non-approved-CQ you're entirely fine to USE that code as server-based infrastructure. Rock on.

Cannot include tooling in a release train or host its project at Eclipse.org

Not true; since eGit is EPL and jGit is BSD, I don't see a problem with distributing the tooling that would connect to a Git repo hosted at Eclipse.org. We worked around the license woes for SVN tooling support. We can do it again. (Of course IANAL, TINLA.)

Conclusion:

No legal concerns with use of Git as hosted server infrastructure. Dash Athena (Common Builder) will support Git. Tooling is safe for inclusion in release trains (either fully like CVS is or partially like SVN is).

Only remaining issue is therefore to get it installed and allocate resources to support/manage it. With all the erosion going on lately, this need should not be trivialized.

Image
Before the thaw this spring, this tree was on top of the bluff. With nothing to support it, it was dropped like an unchampioned feature request.

However, in the spirit of open source, several people on the above bug have offered to help w/ setup, testing, support, etc. So the burden here will be shared, like many things at Eclipse (eg., Babel, Hudson). Erosion continues, but we can all help to shore up the loss.

Image

As most people prolly already know, Sourceforge supports the whole spectrum of VCS and DVCS options. If we don't want people to host projects there, Eclipse has to at least offer something from the DVCS world to encourage participation here. Keep the barrier to entry high, and people will go elsewhere. Lower the barrier, and people will come here to party instead.

Image

With everyone feeling the economic- and time-pinch these days, can we really afford to discourage contributions at Eclipse simply because, as the silverbacks say, "why, back in my day, we only had CVS, vi, and notepad, and dangit, that was good enough!" ?

After all, the new world is inevitable.

2009-04-01

Dash Athena: Epic Fail

Damn airport scanners.

On Sunday while I was flying home from EclipseCon, I decided to try out Git, and migrated the whole Dash CVS tree into a local Git repo. Then, I deleted the CVS stuff in dev.eclipse.org because Git is so much better and DVCS is the Way of the Future.

Unfortunately, somewhere between Calgary and Toronto, my hard drive died, taking with it the whole repo.

Since we now have to start over completely from scratch, Andrew and I have decided to take this opportunity to re-architect a few things:

  1. Instead of Ant scripts which wrap & simplify PDE, we're going to use perl scripts which wrap ant4eclipse. No one really cares about OSGi anymore anyway.

  2. Instead of being able to run a build in Eclipse, we're going to support running on MIDP devices only (smartphones) and older hardware (bug 260000). This will make builds much smaller and more portable for everyone.

  3. We're dropping the built-in support for running JUnit tests as part of an automated build, because the Agile way is to do Build Breakage Driven Development (BBDD) rather than TDD.

  4. Because the Hudson instance on build.eclipse.org is only accessible to an elite few (bug 270633), we're dropping that too, in favour of Cruise Control, much lauded by the WTP and Orbit dev teams for its friendly user interface and many extensible configuration options.

Should you have any concerns with this new plan, don't hesitate to post your comments here. We value the community's input, and will take your thoughts under advisement. Then ignore them and do what we want.