The Microsoft Biology Foundation

As most of you may know, Microsoft – and in particular the folks at Microsoft (External) Research – have started to make major inroads into developing tools for scientists, be that in the area of scholarly communication with a repository offering as well as an ontology plugin for Word or in chemistry with Chem4Word, which is currently being developed by Joe Townsend, Jim Downing and Peter Murray-Rust here at Cambridge and the team at Microsoft External Research.

Now they have also announced a first version of the Microsoft Biology Foundation. From the announcement:

“The Microsoft Biology Foundation (MBF) is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework. Currently it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological Web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, and documentation are freely downloadable […]”

Now every time Microsoft gets involved in something like this, it is bound to generate discussion and debate, such as happened around Chem4Word (see here and links contained in this). I, for one, am happy about every constructive and open contribution to the canon of scientific tools available to the community and welcome the news.

Reblog this post [with Zemanta]

D’oh solved…..

So the mystery didn’t last that long and Peter solved the puzzle quickly (see comments last post). Let me talk through it in my own words….what I expected and what I got and why I got it (as I say, I am a Java newbie and it took some reading around to figure it out).

The programme seems to be easy enough. First, a new set of shorts is initialised and then the programme starts to iterate from i = 0 to i = 9 (as i < 10). During the iteration i gets added to the set. So first time round, i = 0 is added. With the next instruction, then, 0-1 equals -1 and as set is empty and doesn't contain -1, nothing happens. Now the programme goes round the loop again, adding 1 and removing 1-1 = 0, which leaves 1 in the set. Third time round, 2 is added and 1 is removed, leaving 2. So we only ever leave the most recent element in the set and at the end of the loop, the set should only contain 9. As the size of the set gets printed, and there is only one element, I expected 1 to be printed.

Imagine my surprise then, when, instead of 1, I got 10. I was a bit flummoxed by this, but the answer is quite simple in the end: when elements get added to the set, they are added as type short, but when stuff gets removed, it is of type integer.

In essence, the expression i-1 produces a result that is of type int……and Java autoboxes this into an Integer object (remember that sets cannot hold primitive values but only object references?). And furthermore, Short and Integer objects, even if they do contain the same value, do not compare as equal. And so if you have a set where shorts are being added and integers removed, well then nothing happens.

And why doesn't the compiler complain when I remove an integer from a set of shorts? A look at the javadoc for the Set interface solves that one: the add() method enforces that only shorts can be added to a set of shorts. However, the remove() method allows you to remove anything from a set.

So one way of avoiding this, as Peter has pointed out, is to use int rather than short, or, I guess, you could also cast back to short in the programme.

D’oh…..

Over on his blog, Peter sometimes like to tease people with the odd puzzle or so. Over the Easter holidays, I have been playing with Java Sets a bit and promptly fell into a trap. Now as is usual with these things, particularly if you are somewhat of a Java “noob” like me, it took me a while to figure out what went wrong. And in the spirit of sharing and joint pain, I thought I should do a puzzle too and blog this one……it might keep some of you amused. Here is the little toy programme I was playing with at the time:

code.jpg

(Apologies for the code being in a pic – WordPress messes with the angle brackets) Looks easy enough, right? And I promise you it compiles and runs. Now, without running it, what do you think it prints? Now run it. What does it print? Did you expect what it printed (if so, congratulations…..:-)). If not, what could be the reason?