Rambling around foo: vcs

Showing posts with label vcs. Show all posts

Wednesday, 24 July 2013

HOWTO: git - change branch without touching working copy (at all)

Did you ever had the need in a git repository to change to another branch without altering AT ALL the working copy and ever wondered how that's done?

Usual use cases might be when you mde some changes to the working copy thinking you were on another branch, or you double-track in git a directory which is also tracked by another VCS (e.g. ClearCase).

What you need, in fact, is to update the index and not touch the working copy. The command that does that is

git read-tree otherbranch

If you also need to commit the state of your working tree to the otherbranch, you also need to tell git to actually associate the curent HEAD with the branch you just switched to:

git symbolic-ref HEAD refs/heads/otherbranch

I use this approach at my work place* to develop/experiment with possible code improvements on my machine before considering the merge into the official code.

* The preferred VCS is (Base) ClearCase, and I keep a git repository over the relevant part of the project in the ClerCase Dynamic View, so for synchronisations, the files in the working copy are updated by ClearCase and I have to resync my git branch (clearcaseint) following the latest official code from time to time, so I can pull in my local disk git repository the clearcaseint branch and merge it with my experimental changes in my git feature branches.

If people are curious about how I work with ClearCase and git, I can expand on this in another post.

Thursday, 16 February 2012

HOWTO: Git - reauthor/fix author and committer email and author name after a git cvsimport

You might find yourself at some moment when your git repository imported from CVS does not contain all the correct names and email addresses of the commits which were once in CVS but are now part of your project history in your git repo. Or you might do a cvsimport which missed a few authors.

Let's suppose you first import the cvs repo into git, but then you realise you missed some authors.

Before being able to do a git cvsimport, you need a checkout of the module or cvs subdir that you want to turn into its own git repo.

For ease of use I defined CVSCMD as

cvs -z9 -d :pserver:my_cvs_id@cvs.server.com:/root_dir

You will need to replace the items written in italics according to you situation, more exactly, you need to define 'my_cvs_id', 'cvs.server.com' and 'root_dir'. If your acces method to the server is not pserver, you should change that accordingly. This information should be available from your project admin or pages.

Check out the desired module or even subdir of a module

$CVSCMD checkout -d localdirname MODULE/path/to/subdir

git cvsimport -A ../authors -m -z 600 -C ../new-git-repo -R

How to find out the commits which do need rewriting

The way to limit yourself only to the commits that had no cvs-git author and commit information on git-cvsimport time is to use a filter like this:

git log -E --author='^[^@]*$' --pretty=format:%h

This tells git log to print only the abbreviated hashes (%h) for the commits that have NO '@' sign in the 'Author:', which happens if no cvs user id to git author and email was provided in the authors file and git cvsimport time.

We will use this command's output to tell later git filter-branch which commits need rewriting. *

But before that...

How do we find if our authors file is complete?

For this task we'll use a slighly modified form of the previous command and some shell script magic.

git log -E --author='^[^@]*$' --pretty=format:%an | sort -u > all-leftout-cvs-authors

And now in all-leftout-cvs-authors we'll have a sorted list of all cvs id's which were not handled in the original git-cvsimport. In my case there are only 19 such ids:

$ wc -l all-leftout-cvs-authors
19 all-leftout-cvs-authors

Nice, that will be easy to fix. Now edit your all-leftout-cvs-authors file to add the relevant information in a format similar to this:

john = John van Code <[email protected]>
jimmy = Jimmy O'Document <[email protected]>

In case you can't make a complete cvs-user-to-name-and-email map, you might want to use stubs of the following form in order to be able to easily identify later such commits, if you prefer (or you could let them unaltered at al ;-):

cvsid=cvsid <[email protected]>

How to actually do the filtering to fix history (using git-filter-branch and a script)

After this is done, we'll need just one more piece, the command to do the altering itself which reads as follow (note that my final authors file is called new-authors and that I placed this in a script in order to be able to easily run it without trying to escape all spaces and such madness):

[ "$authors_file" ] || export authors_file=$HOME/new-authors

#git filter-branch -f --remap-cvs --env-filter '
git filter-branch -f --env-filter '

get_name () {
    grep "^$1=" "$authors_file" | sed "s/^.*=\(.*\)\ .*$/\1/"
}

get_email () {
    grep "^$1=" "$authors_file" | sed "s/^.*\ <\(.*\)>$/\1/"
}

if grep -q "^$GIT_COMMITTER_NAME" "$authors_file" ; then
    GIT_AUTHOR_NAME=$(get_name "$GIT_COMMITTER_NAME") &&
    GIT_AUTHOR_EMAIL=$(get_email "$GIT_COMMITTER_NAME") &&
    GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" &&
    GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL" &&
    export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL &&
    export GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL
fi
' -- --all

You might wonder what's up with the commented git filter-branch line with the --remap-cvs option. This script will NOT work for you as long as you have the stock git-filter-branch script and keep the option --remap-cvs while not patching your git-filter-branch script (/usr/lib/git-core/git-filter-branch), but that option will provide a file with the mappings from the old to the new commit ids. If you want that function, too, you'll want to apply this patch to git-filter-branch:

diff --git a/git-filter-branch b/git-filter-branch
old mode 100644
new mode 100755
index ae602e3..d1f7ef6
--- a/git-filter-branch
+++ b/git-filter-branch
@@ -149,6 +149,11 @@ do
   prune_empty=t
   continue
   ;;
+ --remap-cvs)
+  shift
+  remap_cvs=t
+  continue
+  ;;
  -*)
   ;;
  *)
@@ -368,6 +373,33 @@ while read commit parents; do
    die "could not write rewritten commit"
 done <../revs
 
+# Rewrite the cvs-revisions file, if requested and the file exists
+
+ORIG_CVS_REVS_FILE="${GIT_DIR}/cvs-revisions"
+if [ -f "$ORIG_CVS_REVS_FILE" ]; then
+ if [ "$remap_cvs" ]; then
+  printf "CVS remapping requested\n"
+
+  CVS_REVS_FILE="$tempdir/cvs-revisions"
+  cp "$ORIG_CVS_REVS_FILE" "$CVS_REVS_FILE"
+  printf "\nFound $ORIG_CVS_REVS_FILE; will copy and alter it as $CVS_REVS_FILE\n"
+  cvs_remap__commit_count=0
+  newcommits="$(ls ../map/ | wc -l)"
+  for commit in ../map/* ; do
+   cvs_remap__commit_count=$(($cvs_remap__commit_count+1))
+   printf "\rRemap CVS commit $commit ($cvs_remap__commit_count/$newcommits)"
+
+   oldsha1="$(basename $commit)"
+   read newsha1 < $commit
+   sed -i "s@$oldsha1\$@$newsha1@" "$CVS_REVS_FILE"
+  done
+ else
+  warn "\nNo CVS remapping requested, but cvs-revisions file found. All CVS mappings will be lost.\n"
+ fi
+elif [ "$remap_cvs" ]; then
+ warn "\nWARNING: CVS remap was ignored, since no original cvs-revisions file was found\n"
+fi
+
 # If we are filtering for paths, as in the case of a subdirectory
 # filter, it is possible that a specified head is not in the set of
 # rewritten commits, because it was pruned by the revision walker.
@@ -491,6 +523,11 @@ if [ "$filter_tag_name" ]; then
  done
 fi
 
+if [ "$remap_cvs" -a -f "$CVS_REVS_FILE" ]; then
+ mv "$ORIG_CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE.original"
+ cp "$CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE"
+fi
+
 cd ../..
 rm -rf "$tempdir"

Then, after running this script, let's call it filter, you should have a brand new git repo with the appropriate authors and their emails set.

P.S.: I have started writing this post some time ago but stopped just before the last part, the one with the filter script. I realise I might be missing something in the explanation, but if you have problems, please comment so I can help you fixing them.

P.P.S.: * I realised in the filter script at some point I wanted to do something like:

for R in $(git log -E --author='^[^@]*$' --pretty=format:%H | head -n 2) ; do
[the same git filter branch command above but ending in ...]
' $R
done

But I think I remember that $R didn't work on the whole history, but only on that revision, or some other weird of that sort. I know I ended up not filtering explicitly those revisions, but the entire history. I hope this helps.

Monday, 9 January 2012

Another Windows tip - How to store cvspass login for CVSNT

Since I am currently working on a Windows machine at work I am looking for ways to make this thing work in a sane way. The latest insane thing is the fact that I wasn't able to log on a CVS server at work from WinCVS (which uses CVSNT) with my regular credentials, while the cached password in Cygwin did work with the Cygwin CVS.

So the obvious fix was to copy the .cvspass file from cygwin to whereever CVSNT kept its cvspass file. Well, it isn't that easy, since CVSNT keeps such passwords in the Windows registry. But since I had no previous logins with CVSNT, I didn't knew what to put in the registry.

I found really easily that the key is under HKEY_CURRENT_USER\Software\cvsnt\cvspass, but how do I save it? Looking at the line in my cygwin .cvspass I saw the line had the format:

/1 :pserver:[email protected]:/u S()meh4s'h00

I finally found out that I have to create a string value with the name ":pserver:[email protected]:/u" and the value data that hash "S()meh4s'h00" and plainly ignore the first field.

Stay tuned. The next article will be about what's common between Windows 7 and GNOME 3 / gnome-shell, since I upgraded my home laptop to wheezy (I really wanted to use pitivi 0.15), and my desktop at work to Windows 7.

Monday, 13 September 2010

Fixing a bookmark corruption in Iceweasel/Firefox

At some point in the past my bookmarks broke in such a way that I wasn't able anymore to alter in any way the bookmarks, deletions, additions, reorganisation, nothing worked.

Way before this event I set up a git repo in my .mozilla directory and committed everything in the directory (except obvious cache files and such) in the repo. I didn't make a cronjob with this since I didn't thought of a proper way to make sure the browser wasn't started (the data might have been inconsistent at commit time), and, since most of the time I have the browser started, it seemed rather useless to try to automate for a very, very small window of time when the browser data would be in a consistent state. As a consequence, I did occasional manual commits in the git repo.

When the bookmarks issue appeared I realised that my last git commit was so old that it didn't made any sense to try to restore since I would have had lost bookmarks, passwords and plugins. So I made a checkpoint commit os the broken profile and resorted to all sorts of manual attempts at fixing the issue, but all my efforts of playing with the bookmarks.* files proved to be useless.

I concluded that icewesel/firefox 3.5.x was using some other mechanism and decided to dig into the issue at some latter time when time allowed me to analyse and fix it. Today was that day.

I began by creating a backup for the bookmarks via the Bookmarks Manager export function. After a hickup, it managed to save a proper bookmarks.html file. I tried to import that file, but the breakage remained.

Then I decided I should prepare for the worst, installed an addon which allowed me to save all stored passwords and created a new account (started firefox from the command line with 'firefox -ProfileManager') with the intention to transplant files from the old files into the new one. Short after that I realised that it would be smarter to commit the new profile, too, import the bookmarks from the backup file and replace the touched files back into the old profile.

After the import and closing the browser, git informed me that the modified files were:

XPC.mfasl
XUL.mfasl
cookies.sqlite
localstore.rdf
places.sqlite
pluginreg.dat
urlclassifier3.sqlite

At this point the modification of the binary .mfasl files looked very ugly to me, but I had a hunch that the only relevant file to my problem was the places.sqlite file and I went on to check the contents of the file with sqlitebrowser. After inspecting also the urlclassifier3.sqlite file (whose contents were unreadable), I decided to give it a try at simply placing the places.sqlite file in the old profile and see if the problem would be fixed that way. After all, I had the whole profile's state in git so any breakage could have been undone easily.

I started firefox using the old profile and the problem was fixed. Yay!

Now I have my bookmarks back and a suggestion for the mozilla people: please don't keep useless files like bookmarks.html in the profile once they become outdated, they are misleading.

P.S.: I still have to think of a way to make automated sane commits. Maybe I'll try to make an iceweasel/firefox addon. This could be a nice idea since I could learn something new, too.

Thursday, 6 November 2008

Coordinating localization via git

Note: this is quite long and might not be interesting for people not involved with /in either of: debian i18n, debian l10n, git, shell scripting.

For some time now I have been the de-facto coordinator of the Romanian localization team. During this time I was faced multiple times with problems related to motivation of the team members, setting goals for a release, coordination of the changes inside the team, integration of new translators in the team, loosing team members, my lack of time on some occasions and other problems.

I thought a lot about how to improve on those points and I was never satisfied with the answers I got. What I think was the worst problem in the team was the inability to set clear goals, especially during lenny's development cycle.

During sarge's and etch's development cycles we were in an infancy state and setting as the goal to have a fully translated installation process was really enough to keep people motivated. For sarge we missed by a small margin, but the translations were of poor quality, while for etch we worked more on improving the translations.

But during lenny it was bad, really bad. My free time had been shrinking for a while, starting with etch's release, and we were unable to set clear goal, since for etch we managed to have the installation process fully translated and a few other translations. There was no way for us to reach 100% translations during lenny's development cycle, so setting that as a goal was really unrealistic. Percents by themselves don't mean anything for people, and as long as there's no substance to those numbers, there's no motivation to reach for one arbitrary percent.

I tried to set as a goal translating the packages installed by default in a new installation, but that hit the eternal question How do we know easily which are those packages?. This remained sometimes unanswered or got an unsatisfying answer. Also, there was a goal to have correct diacritics for Romanian in lenny, to have aspell-ro that uses the correct diacritics.

I even got to a point that I, myself, lost my motivation and set myself a personal goal of overrunning the level the translations that the language just above Romanian had in the po-debconf l10n statistics (ranking between languages). This was nice way for me to keep myself motivated, but I had my reserves in making this motivation public out of fear of being misinterpreted, because, by a strange coincidence, that next language was Hungarian, and in Romania's history there was some friction between Hungaria and Romania, while there are still some tensions with the Hungarian minority in Romania, in areas where they represent the local majority.

I managed to reach my personal goal, but this wasn't addressing the big picture.

So, sometime around the start of this year I started thinking about ways to coordinate the Romanian localization team in order to have:

a clear goal at any given time
a way to always be able to change that goal as we go
a way to sync with eventual calls for translations, or the current sid translations
stats immediately available
automated checks for spelling, correct diacritics usage and other checks that might be useful (e.g. translation completeness)
an easy way to assign somebody else as a language coordinator (I would appreciate some help or I might even consider stepping down)
easy integration of new translators (by providing immediate answer to the question "What can I do to help with the translations?")

For short, a tool that would allow the team to work more efficiently while having the possibility to set clear goals in order to keep people motivated.

So after some pondering, I thought that creating a repository with the translations and the helper tools that would do the funky sync, checks, stats would be the best way to do that. So I started hacking on that somewhere around July-August and I published the result, but without much publicity, since it is still incomplete.

Some of the technical details are still in a haze, but I have a general idea and I got some basic functionality.

Today I decided that I should announce this semi-officially though my blog, maybe I get some input, ideas, or even contributions (I really should write a TODO).

I give you the Debian L10n Romanian coordination repository.

This is a git repository that has some tools to facilitate translation coordination and the translations that are current in the distro for the team.

Can be cloned with:

git clone git://git.debian.org/git/users/eddyp-guest/debian-ro-repo.git

or, if you're behind a restrictive firewall:

git clone http://git.debian.org/git/users/eddyp-guest/debian-ro-repo.git

Currently the work flow for updating a translation is as follows:

source _bin/polibs (. _bin/polibs)
cd foo
po_refresh
complete the translation
po_rearrange "ro.po"
git add "ro.po" && git commit -m "updated translation for foo"
send the translation ("git format-patch origin" and send the patches by mail, or, alternatively, just "git push")

Features:

provides a po_refresh function that can import material directly from http://i18n.debian.net/material, but can also allow manual imports (template.pot from a call from translation)

for a new translation: source _bin/polibs (. _bin/polibs), make a directory with the name of the source package, cd into it, and run po_refresh

po_rearrange - beautify and unify the layout of PO files (facilitating compact and sane diff-ing for PO files)
po_merge uses compendium, if present

Planned features:

sync translations/templates from package VCS-es (Vcs-* headers and debcheckout should be the means to the end)
po_rearrange should be called as a pre-commit hook; should either reject the commit if the po file was not rearranged, or automatically rearranged before the commit
generate stats
add commands for "what's outdated", "what needs review", "submit translation", and maybe "reserve translation for offline use"
conflict merges should be done via po_merge (.gitattributes is key here)
support other file types (?) - does this make any sense?
periodic and automated sync with sid for all translations

Problems:

security - running tests automatically from files within the repo doesn't seem too wise, but looks like the only way to get automated testing on any translator machine; maybe keeping the code in a submodule might address this issue?
entry level translators still have a hard time - UI sucks now; there should be a wrapper command that should use the library functions and should provide a useful help
is git too difficult ? - git backend usage maybe should be cloaked?
still in development/alpha stage - I still haven't figured some of the issues
central repo or really distributed - should there be a central git repo where the coordinator(s) do the pushes? it seems the central repo with a small pushers team for new translators (which can't commit directly) might actually facilitate interactions between experienced and new translators to instruct/bring up to speed the rookies

I was hoping that the release notes for lenny would facilitate from this infrastructure, but unfortunately I was lately in a really inactive period wrt Debian.

Questions, suggestions, ideas are welcome.

Monday, 18 February 2008

[VCS/SCM] same language, different lingo

When people say "lightweight branches":

In git lingo, most likely they will think about the fact that you don't need a different directory to switch to a different branch.
In subversion lingo, they will probably mean that the cost of making a branch in the repository does not impose an expensive operation
In bzr lingo, it seems they mean a repository with short history (aka shallow copy, for most other people)
it seems that mercurial refers to git's meaning as "named branches" (not sure about this); not sure if mercurial documentation refers to anything else as "lightweight branches"

Export means:

for subversion: to create a working copy without any meta information
for mercurial/hg: "The act of exporting a changeset generates an augmented patch file that describes the change."
for git, it seems to be a useless 'cp -r' operation

There are several other differences in the VCS/SCM lingo paralel uses, if you want to look closely, but these are enough for now...

No wonder we have communication problems and people don't grasp the power of a new tool when it changes the meaning of previously used terms.

Rambling around foo