Semantic Web Tools and Applications for Life Sciences 2009 – A Personal Summary

A bicyclist in Amsterdam, the Netherlands.
Image via Wikipedia

So another SWAT4LS is behind us, this time wonderfully organised by Andrea Splendiani, Scott Marshall, Albert Burger, Adrian Paschke and Paolo Romano.

I have been back home in Cambridge for a couple of days now and have been asking myself whether there was an overall conclusion from the day – some overarching bottom line that one could take away and against which one could measure the talks at SWAT4LS2010 to see whether there has been progress or not. The programme consisted of a great mixture of both longer keynotes, papers, “highlight posters” and highlight demonstations illustrating a wide range of activities at the semantic web technology – computer science and biomedical research.

Topics at the workshop covered diverse areas such as the analysis of the relationship between  HLA structure variation and disease, applications for maintaining patient records in clinical information systems, patient classification on the basis of semantic image annotations to the use of semantics in chemo- and proteoinformatics and the prediction of drug-target interactions on the basis of sophisticated text mining as well as games such as Onto-Frogger (though I must confess that I somehow missed the point of what that was all about).

So what were the take-home messages of the day? Here are a few points that stood out to me:

  • During his keynote, Alan Ruttenberg coined the dictum of “far too many smart people doing data integration”, which was subsequently taken up by a lot of the other speakers – an indication that most people seemed to agree with the notion that we still spend far too much time dealing with the “mechanics” of data – mashing it up and integrating it, rather than analysing and interpreting it.
  • During last year;s conference, it already became evident that a lot of scientific data is now coming online in a semantic form. The data avalanche has certainly continued and the feeling of an increased amount of data availability, at least in the biosciences, has intensified. While chemistry has been lagging behind, data is becoming available here too. On the one hand, there are Egon’s sterling efforts with openmolecules.net and the data solubility project, on the other, there are big commercial entities like the RSC and ChemSpider. During the meeting, Barend Mons also announced that he had struck an agreement with the RSC/ChemSpider to integrate the content of ChemSpider into his Concept Wiki system. I will reserve judgement as to the usefulness and openness of this until it is further along. In any case, data is trickling out – even in chemistry.
  • Another thing that stood out to me – and I could be quite wrong in this interpretation, given that this was very much a research conference – was the fact that there were many proof-of-principle applications and demonstrators on show, but very few production systems, that made use of semantic technologies at scale. A notable exception to this was the GoPubMed (and related) system demonstrated by Michael Schroeder, who showed how sophisticated text mining can be used not only to find links between seemingly unrelated concepts in the literature, but can also assist in ontology creation and the prediction of drug-target interactions.

Overall, many good ideas, but, as seems to be the case with all of the semantic web, no killer application as to yet – and at every semweb conference I go to we seem to be scrabbling around for one of those. I wonder if there will be one and what it will be.

Thanks to everybody for a good day. It was nice to see some old friends again and make some new ones. Duncan Hull has also written up some notes on the day – so go and read his perspective. I, for one, am looking forward to SWAT4LS2010.

Reblog this post [with Zemanta]

An appetite for open data…

…is what I have encountered here at Antwerp already. I am currently at the annual meeting of the Dutch Polymer Institute, with which I have been associated in various forms over the best part of five years now. We are the guests of Borealis here in Antwerp and as such, it promises to be an interesting meeting. The morning will be taken up with “Golden Thesis Awards”. The DPI evaluates all PhD thesis it funds by scinetific merit and the best PhD students in a year will be given an award. This is followed by an excursion to Borealis and in the afternoon, there will be thematic sessions: “Polymers and Water” and “Polymers and Time”. The former is self explanatory and the latter concerns mainly molecular simulations of polymers at short and long time scales. This is followed by poster sessions and a Borealis hosted dinner in the evening. Tomorrow then we will have several further talks on bio-based polymers, sustainability and solar cells and in the evening a brain-storm sesssion: “What could polymers mean for the bottom of the pyramid?” I like DPI meetings – they are extremely young…most of the participants are PhDs and Post-Docs and always brimming with energy.

In that spirit, I arrived at my hotel last night and sat down for dinner. It didn’t take long before I was surrounded by old and some new acquaintances and we spent the time catching up and discussing what we have been doing. And inevitably the conversaton turned to polymer informatics and open data. There were many questions: “Will extraction of data from a manuscript cause problems with publication later?”, “Why should I trust you and give you my manuscript or thesis to datamine?”, “How does copyright work out?” “What happens to the publishers – why should they not sell my data?” etc. However, all the minds were open. They see the argument for open data and open knowledge and they agree with it in principle, but there is great uncertainty as to the politics and technicalities associated with open data. The moral of the story is: much more talking needs to be done and much more education. Open access and open data evangelists should put together an FAQ for “mere mortals” i.e. researchers who do not think about this all the time and who should not have to think subtly about the differeneces between “gold OA”, “green OA” “libre OA” and what have you. We need to do much more talking to the science community. Let’s start now. And let’s not weaken our position by OA sophistry. I wil try and blog some more as the meeting goes on and hopefully also provide some photos.

PS: You will see some new and unusual tags at the bottom of this blog post and(UPDATE: no tags apparently) links in the text. I have installed Zemanta to try and make this blog semantically a little richer. The tags and links are autogenerated and I hope the result is worthwhile.

Reblog this post [with Zemanta]