Sources of Polymer Information

I am currently working on a paper, trying to outline our informatics strategy for the polymer science and in particular for the polymer pharmaceuticals community. As part of the paper, I am reviewing sources of polymer information and their accessibility in terms of open access. And, it will come as no surprise, that the situation is depressing.

So what are the sources of polymer information? Well, there are mainly three: papers, theses and data compilations. Let’s look at these in turn.

Papers. Well, Peter continues to blog about this extensively. Papers are copyrighted by the publisher – all rights reserved. Open Access publishing is practically not done in the polmer science: papers submitted to the standard canon of polymer journals (including supplementary information where available) are fully copyrighted by the publisher. There is a tiny spark of light: e-polymers. Here’s a journal description copied from the e-Polymers website:

e-Polymers is a peer-reviewed internet journal under the auspices of the European Polymer Federation (EPF). In the area of polymer science and engineering, it makes novel scientific and technological results available both in academia and industry, and basically free of charge.

Furthermore, e-Polymers is a forum dedicated to the free and fast exchange of information. Therefore, it will comprise

  • original publications on basic polymer science and engineering,
  • reviews on trends in science and technology, in academia and industry,
  • reports on educational topics,
  • information about joint programmes, e.g. of the EU,
  • job advertisements and appointments of new chairs etc.,
  • business reports (abstracts),
  • commercial links and advertisements.

Thus, e-Polymers is the answer to the strange situation that many institutions cannot afford to subscribe to journals which – at the same time – they strongly support by submission of high-quality papers, refereeing etc.”

So far so good. Or not, as the matter may be. When first reading this, one could get the impression, that words and phrases such as “free” and “e-Polymers is the answer to the strange situation that many institutions cannot afford to subscribe to journals which – at the same time – they strongly support by submission of high-quality papers, refereeing etc.” would point to an open access publication or at least to the spirit of open culture. However, this is not so. Upon closer inspection of the author instructions one finds, that upon acceptance of a paper submitted to e-Polymers, the author transfers all copyright to the journal. The journal is therefore merely free to view (which is a first step) but NOT open access according to the Budapest Open Access Initiative Declaration. When digging around the website further, one finds even more obstacles: the journal is a priori only free to view for participating institutions and their members. Non-members can gain free access, though this has to be requested via an institutional library. So it is free to view with obstacles, which removes us even further from true open access. Nevertheless, the European Polymer Federation seems to have correctly diagnosed the problem in scientific publishing at the moment and has taken a baby step to address the issue of access. This is encouraging and maybe can be built on to move to full open access in the future.

Theses. The problems associated with theses in the polymer science field are the same as those encountered for theses in general: availability/accessibility and a clear licence. Although the situation is improving, a significant number of institutions only require the submission of one or several paper copies of a doctoral thesis. As such, of course, the contents of the thesis is lost in terms of machine processing and information extraction. But even when available in an institutional repository, there is usually a lack of a clear licence or the contents are again copyrighted by the institution itself and therefore cannot be freely accessed and used in terms of the Budapest Open Access defintion.

Data Compilations. A number of compilations for polymer data exist, which are in extensive use by scientists. The most important ones are the Polymer Handbook (Eds. Brandrup, Immergut), The Wiley Database of Polymer Properties, Polymers – A Property Database and the PoLyInfo database. Let’s look at these in turn:

The Polymer Handbook. Published by Wiley. Non-digital, contents copyrighted and all rights reserved by Wiley, commercial.

The Wiley Database of Polymer Properties. Published by Wiley and essentially a HTML version of the Polymer Handbook. Digital, subscription basis, log-in required, contents copyrighted and all rights reserved by Wiley, commercial.

Polymers – A Property Database. Published by Taylor and Francis. Digital, subscription basis, log-in required, contents copyrighted and all rights reserved by Taylor and Francis, commercial.

PoLyInfo Database. Developed by the National Institute for Materials Science of Japan. Digital, log-in required, contents copyrighted and all rights reserved by NIMS, non-commercial, free to view.

So overall, the situation is even worse than for small molecule chemistry, where open access resources are starting to make a real impact and which is increasingly liberating chemistry data (see, for example, PubChem, CrystalEye). At the moment, there is nothing even remotely comparable for polymers.

So, how could we change the situation? Clearly, there needs to be a multipronged approach:

Community building: There are a number of existing or emerging polymer communities, which might be open to the idea of open data – a collaboration with just one of them could be enough to act as a demonstrator and maybe is catalytic.

Continued advocacy: Open Culture advocates are getting increasingly vocal and efforts such as those of Peter, Peter Suber, the Creative and Science Commons, the Open Knowledge Foundation (Rufus Pollock) and many many others are invaluable. Education must be part of this: where students are not already aware, they ought to be confronted with the idea as undergraduates: nowhere more so than in chemistry.

Continued technology development: The more I understand about the technologies that we are currently developing and that are already changing the face of the internet, the more I am convinced that these technologies in themselves will force a radical change in the business model that is driving scientific publishing. The current one is becoming increasingly untenable and the aggressive behaviour currently shown by some publishers only indicates an attempt to defend a dying business.

Continued pressure from funding bodies: Funding bodies need to be convinced to “vote with their wallets” and require researchers to deposit manuscripts and data in OA archives/repositories as a condition of funding. The Wellcome Trust is exemplary in this context.

Maybe in this way, we will be able to remove the enclosure, that is currently choking “the intangible commons of the mind”[1] that currently impedes scientific progress.

[1] Willinsky, J., The Access Principle – The Case for Open Access to Research and Scholarship, MIT Press, Cambridge, Massachusetts, (2006) and references therein. (Available under CC licence from MIT Press webpage)

The Cambridge Polymer Builder.

The first “proof of concept” product of the Cambridge Polymer Informatics Group is up on the web. It is a demo application of a polymer builder, which uses Chemical Markup Language, Polymer Markup Language and Jumbo to build various types of polymers.

The polymers are constructed from small fragments, such as CH, CH2, CO etc. groups with the associated connection table defined in CML. Polymer Markup Language (PML) then contains a set of instructions in terms of how these fragments are “glued” together, how torsions (in 3D representations) are dealt with etc and it can also deal with distributions (of torsional angles just as much as molecular weights) and probabilities (e.g. for random compolymerizations etc.). (More details in a forthcoming paper).
The polymer builder subsequently takes the fragments and the relating PML document as an input to enumerate a full connection table for a macromolecule in CML. (We have not implemented ensemble building in this demonstrator).

polymer-builder.gif

Right now we can build most structural motives, such as homopolymers, block- and random copolymers, dendrimers and branched systems. In this demonstrator, we have not currently implemented bump checking and a number of other controls, but we are working on them as I write. The demonstrator app is available as a webservice here. Please go out and take it for a spin. And, by the way, we are grateful for feedback….so let me have your thoughts and comments via the comments function on the blog.

Something exciting, catalytic and quite delightful…

…has happened today.

I recently blogged about attending the first ESF summer school in Nanomedicine in Wales and speaking about our efforts in polymer informatics there.

After my talk, I was approached by an undergraduate, Hosea Handoyo, who wanted to know more about our work and who, amongst many other things, is currently a Neuroscience student in the Netherlands. When I asked him why he was interested, he said that he was attending the summer school in the capacity of a “student journalist”. Apparently, Hosea is part of a group of Indonesian students, which attend research conferences and try to find out what is going on in various areas of science at the moment. They then write this up in the form of “popular science” articles, which get published on the web.

Now if I remember our conversation correctly, there are several points to this. Firstly, it is intended to inform the Indonesian public in simple terms about what is going on at the cutting edge of research science at the moment. Secondly though, it also serves a landmark for students in Indonesia as to what research is going on where and which institutions/research groups they might consider joining in the future.

Now this morning, when I looked over my blog, I saw an incoming link from a website netsains.com (as annoying as the WordPress software may sometimes be when wanting to publish code in angle brackets, it is phantastic for all the housekeeping bits it offers). It looked a bit odd, but I could make out the terms “polimer informatika” in the link and so investigated further. And indeed, it turns out that the link led to an article that Hosea had written about our work here in Cambridge. Now his article is all written in Indonesian and I had no idea what it said, though I could make out some words “polimer informatika”, “kanker” (cancer – a lot of work in polymer pharmaceuticals is done in the area of anti-cancer drugs), the Unilever Centre was mentioned as was polymer markup language (PML), some of the databases I had discussed and Peter Corbett’s OSCAR (which always wows people every time it is demonstrated). I have since found out that his article has also appeared on the pages of the Indonesian Chemistry Forum.

Furthermore, there were links to all of the Unilever Centre blogs, my blog, a link to OSCAR 3 on sourceforge and even to the video on with a lecture on polymer informatics which is up on Google Video. I then got in touch with Hosea via email to make sure that I had remembered the details of our conversation correctly and I also asked him about the the purpose of the netsains.com website. Well, he told me that the site is supported by the Indonesian Minister for Research and Technology and is modeled on the Dutch Kennislink site. Kennislink was set up by the Dutch Ministry for Education, contains over 5000 popular science articles across all disciplines and is the most prominent Dutch language popular science site.

Now in his email (quoted with permission), Hosea said:

“All of these websites are aiming to bridge the gap between Indonesian scientists (and students) abroad and the ones in Indonesia. ICT especially internet is very limited in Indonesia (though the gadgets are quite sophisticated) so it is troublesome for people just simply browsing for information. By providing them the hottest issues from Europe, Japan, US, China, and many other countries, we share the information of research and development of scientific world with them. We could provide them the information of technology and in return, Indonesian communities abroad get updates of what happens in Indonesia and the link to translate their research/latest technology to what public in Indonesia needs. Simply like an open source idea but this is more to information sharing and empowering public awareness in scientific field.”

I found this really heartwarming and delightful for a number of reasons:

  • A genuine interest in science. It is phantastic to see that undergraduates go out to conferences with an interest in science and a desire to find out what is going on. In the past, I have worked in institutions where even the attendance of PhD students and post-docs was considered to be a “waste of time and money”. Personally, I think that it is never too early to expose someone who is genuinely interested in science to the cutting edge of what is going on in the world.
  • The idea of sharing and openness. It is an often quoted mantra, but one that is hardly ever practiced. We tend to lock up science and access to data in closed access journals, books or other resources. Often enough that already breaks our backs at well-resourced and well-funded institutions like Cambridge and makes scientific progress difficult. In other parts of the world, this is an absolutely insurmountable barrier. However, the more people like Hosea and others write about science on websites like kennislink.com or esains.com, the more people blog about their and other people’s science (the chemical blogosphere is exemplary in this) and the more students write their theses in the open, the more we can start to break these barriers down. And the internet, blogs, wikis etc. are the disruptive technology that will make it possible. Furthermore there is a social dimension here: those with access to resources (IT, conferences, literature etc….) enable access for those with fewer resources in the most efficient way through filtering and feedback.
  • The ability to set an agenda. Undergraduates turn into research students, post-docs, academics and decision makers. As research students, they have (always assuming the presence of an enlightened supervisor) the ability to determine what they work on (through choice of the research group they join) and maybe therefore also a choice over the culture in which science is done and in which they want to do science. As post-docs and academics they have the opportunity (together with their colleagues) to fundamentally change the way science is done and communicated. And as decision makers, they might just hold the purse strings, which enables them to tell academics how and where to publish (some funding bodies, for example, mandate that research funded through that body is published in open access journals or reposited).

I think that Hosea and people like him are the catalysts for positive change, which we need to move forward.

A harmless drudge…..

An awful lot of our work in polymer informatics is concerned with the development of ontologies, taxonomies and dictionaries, all of which aim to define the terms we are using for the benefit of fellow humans but also machines.

In doing so, we are of course part of a long tradition – and one of the most important exponents of this tradition in the UK is Samuel Johnson. I have just finished watching a documentary on the BBC on Johnson’s life and also the trials and tribulations he had to go through when working on the dictionary.

The issues are much the same as those we encounter today: agreeing on the definition of a term (in our polymer work, a favourite and recurring example is IUPAC’s definition of a macromolecule as “a molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass” or the fact that IUPAC regards “macromolecule” and “polymer molecule” as synonymous terms, but then goes on to define a polymer as “a substance composed of macromolecules”. In the former example, is thatr really a definition that could satisfy a rigorous scientist? And in the latter case, given the individual definitions of the terms “macromolecule” and “polymer”,..does it make sense to speak of “polymer molecules” in the first place?), the form a dictionary should take and, of course, ensuring that the dictionary is used (i.e. accepted) by a community of people.

It is amusing, then, to see how Johnson himself defines a person engaging in this activity – a lexicographer – and thus by, extension, defines the modern (computational) ontologist:

Lexicographer: A writer of dictionaries; a harmless drudge[…]