Licences for Ontologies

Creative Commons: Some Rights Reserved
Image via Wikipedia

One of the things that I have been grappling with for quite some time is the whole notion of licences for ontologies. Of course, neither I – nor anybody else for that matter, should have to worry about this. But the world is the way it is and so the question is: what would an appropriate licence for an ontology be? The answer to that question would mainly depend on what an ontology actually is. Is it a piece of software? Is it a database? A structured document (whatever that means in the context of licensing)?

I have spent quite some time talking to my colleagues about this and we haven’t been able to come up with a satisfactory answer. Even emailing the good folks at the Open Knowledge foundation did not ellicit a response. Now, it seems that the Science Commons have made an attempt to provide some answers on their website.

They state that whether an ontology is protected by copyright law will mainly depend on whether the ontology “contains a sufficient degree of creative expression” or whether it draws entirely on fact. In the latter case, it might not be protected. Now such a statement in itself is intriguing – in the communities in which I and many of the Science Commons people tend to spend most of my time, ontologies are usually understood to be representational artefacts, “whose representational units are intended to designate universals in reality and the relations between them.” Just how much “creative expression” that would allow is an interesting debate in itself, which is probably best had in the pub. But I digress.

Science Commons then goes on to quote some legal precedence in which US courts have upheld copyright in medical ontologies. So really, we don’t know. Science Commons then counsels “pre-emptive” licencing: if in doubt, slap a Creative Commons licence on your ontology (CC0 is explicitly recommended) – if it is later found that copyright cannot subsist in ontologies and that your licence is therefore invalid, you haven’t lost anything, but if it turns out that copyright does indeed subsist in an/your ontology, your bottom is covered. small surprise, too, that the Science Commons would wish to promote the licences of their sister organisation the Creative Commons.

Again, I am not convinced that Creative Commons Licences are an appropriate form of licence for ontologies any more than I am convinced that the GPL licence attached to ChemAxiom is an entirely appropriate licence for an ontology. I would be interested in what the OKF experts have to say about this. The bottom line, for now at least, seems to be that we just won’t know until someone does a lot of deep thinking or it will be tested in court.

Any comments and opinions would be extremely welcome!

Reblog this post [with Zemanta]

Tomorrow’s Giants 1 – Big Data

I recently spent an afternoon at a meeting entitled “Tomorrow’s Giants”, which was jointly organized by the Royal Society and Nature and took place here in Cambridge. The meeting was in preparation for a larger meeting, also entitled “Tomorrow’s Giants” which is to be held on the 1st July 2010 as part of the Royal Society’s 350th anniversary celebrations. The purpose of the larger event will be to bring together scientists and politicians in an effort to gather scientist’s visions for the next 5 decades and to ask questions such as

  • What will be required to enable academic achievement in the future?
  • What are the main goals and challenges facing science in the future?

In discussing this, funding considerations were to be left to one side. This is interesting, considering that the current fashion and move towards larger and larger platform grants has profound implications for some of the questions the Royal Society and nature wanted to debate.

As part of the preparatory Cambridge meeting, the Royal Society and Nature had singled out four questions they whished us to debate:

  • “Database Management”
  • “Science Organisation”
  • “Metrics”
  • “Career Security and Support”

For historical and other reasons, readers of this blog will not be surprised to know that my personal interests are centered on scientific data and I shall therefore spend a few blogposts on the question of scientific data, that we were asked to debate. In this context, “Database Management” was a very unfortunate name for a vastly important topic which had all to do how science handles its data in the future. The questions that were asked were: (a) Managing big data – what is the right infrastructure for data sharing, (b) is “big data more of a concern for some disciplines rather than others (e.g. biologists), (c) how can – and is it appropriate to – facilitate inter-laboratory dataset comparison (d) does the type of data have an impact on the ways it can be shared? (d) future literatures in the wider sense i.e. not just how findings are published in journals, but how can interim findings be shared and accessed? (e) what about the tension between transparency and data protection (f) implications for the growing use of web2.0 as a resource for sharing research findings and (g) how well organised is the current use of web 2.0 and how does this impact accessibility?

These were all wonderful questions which must be asked in order to “future-proof” science and to which we were expected to provide answers in 20 min (!). While I was and am glad that we were to debate these issues, the devil is – as always – in the detail and the undifferentiated nature of asking made might heart sink again.

In this post, I would like to address the first two questions:

Managing big data – what is the right infrastructure for sharing
The Good: What is exciting here is the recognition by the RS that data needs infrastructure. And that infrastructure is both technical as well as sociocultural problem. Some components of that infrastructure (and by far not all) that are direly needed are

  • Data Repositories (departmental, university level, subject-specific and transinstitutional
  • Open, non-propriatary and standards-based markup (exchange formats)
  • Computable Metadata (e.g. ontologies which can be used to give data COMPUTABLE meaning
  • University librarians who think that preservation of the data generated by one’s own instritution falls WITHIN the remit of the library
  • Scholarly Societies who remember that they were founded in response to a scaling problem – namely the increasing availability of scientific data and the need to distribute it – and who start taking this reason for their existence seriously again rather than trying to lock up data in inaccessible and copyrighted/DRM’ed/pdf’ed publications
  • Academics who belive that data science should be a compulsory part of every undergraduate’s course
  • Funding agencies who mandate open access publishing and data sharing as a condition of the award of a grant
  • The availability and use of appropriate data licences, such as Creative Commons licences or Open Knowledge Foundation Licences

etc etc…..I am sure there are many more things that I should mention here and that I have forgotten. Come to think it: funding bodies and universities – don’t forget about or squeeze out the infrastructure guys. Don’t say to the infrastructure guys that the development of /institutional repositories/markup languages/models/eScience tools is not science but it engineering and has no place in a research university that “does science”. Do you detect bitterness? Yes you do – some of my colleagues – even those that call themselves “chemoinformaticians” tell me just this on a regular basis. Only thing is – without the infrastructure guys and the engineers that develop all of this stuff and develop it in a scientific manner using scientific methods, NO science will get done because there will be no infrastructure to support it. And which buttons will you push then to calculate your transition states, dock your molecules etc.? Yes – data needs infrastructure…now universities, senior academics and funding bodies….put your money and your recongnition where your mouth is.
The Bad:The focus of the question on BIG data perturbs me immensely. Because BIG data is, well, BIG data, one of the first things that people who produce/manage/exchange BIG data have to do – almost by the very nature of the thing – is to worry about infrastructure for BIG data. And while we may not have all the technical answers just yet (e.g. it is sad in a way that the fastest bandwidth we have for shuffling really BIG data, such as produced by astronomers around the world, for example, is to load it onto hard disks and to load these onto trucks and to send the trucks on their way) people who deal in BIG data are very aware that it needs infrastructure and hardly need convincing. It is not BIG data that is the problem. What is the problem, is data that is produced in the “bog-standard” long-tail research group of between 3 and 20 people. It is these guys, who usually DO NOT (unless they happen to be blessed and are biologists) have the infrastructure to make data available in such a way that it can be stored exchanged and re-used. It is the biology/chemistry/physics…PhD student that has slaved for three years to assemble data and keeps it an Excel spreadsheet that we need to worry about – how do we make it possible for him to publish his data and make it reusable? How about the departmental crystallographer who sits on thousands of publication-quality but unpublished crystal structures just because the compound never quite made it into a paper. We need to develop mechanisms and infrastucture for the small “long-tail” laboratory scientists…the big data guys have this figured out anyway.

Is Big Data more of a concern for some disciplines rather than others (e.g. biologists)?
The GoodYes of course it is. High throughput screening/ gene sequencing/radioastrononmy produce huge amount of data. Yes it is a concern for them – but they are thinking about it already.
The Bad Big data again. See above – it is not about Big data…let’s talk about the synthetic organic chemistr and the data associated with the 20 compounds he makes over 3 years too, please.

I’ll continue to address some of the other data related questions in other blog posts.

Reblog this post [with Zemanta]

John Wilbanks @ NESTA Open Innovation Meeting

John Wilbanks – SCIENCE COMMONS

Starts of with talking about network effects: resources become more valuable the more connected they are
What was it that allowed to turn the ARPANET into the Internet
it was open transmiossion protocols…anyone who could build a compatible computer could connect
same for the www
the ability to make copies and derivatives – this is at odds with copyright
ignoring a law does not scale

Science KNowledge
Science has not been disrupted by the web – journals in HTML same as Journals in paper
Incrementalism – distributing pdfs of science papers is an earhorn
we do not have the ability to hyperlink knowledge – cannot “compatibly communicate” and then we wonder why we do not get network effects
Open Innovation (see Henry Chesborough)
depends on the quantity, quality, legal availability and tech usability of open innovation
Computers – tcp/ip
domuments – html/http
knowledge – commons

Literature: Open Access under CC
Neurocommons: open scientific data, difficulty with integrating databases – different licencing schemes significantly hinder preparation of derivative data products
CC0 1.0 Universal licence

The comons is technical NOT just legal…

CC workin gon commons to hook up physical objects with the web
CC now working on developing a fully virtual drug discovery environment, glued together by CC….for data exchange etc and giving freedom to operate
Nike and CC patent exchange via CC

Open Innovation
purposive inflows and outflows of knowledge
expnd the capacity of the external market to generate internally useful knowledge
the business model is at the centre of value creation and capture

COST
in software
frr as in speech
free as beer
==>free as in puppy (not really free..have to maintain the dog)
YOU ARE BUYING A DOG, not beer)

Reblog this post [with Zemanta]

James Boyle @ NESTA Open Innovation Meeting

I am live-blogging this…hence the sketchiness and the typos…
James Boyle – Prof of Law Duke Law School

Starts off by both apploauding NESTA and Welcome Trust

Open Innovation in Culture and Science
Cultural Agoraphobia…..fear of open methods to develop innovation and science
e.g. if someone had proposed Wikipedia as a business plan –>no funding
e.g. IBM making more money from open innovation than closed innovation

Are there new methods for achieving open innovation? Purpose of the workshop to elucidate that question.

Creative Commons
as a way to distribute creative work in the face of copyright law
make open sources searchable and findable
INITIAL EXPECTATION: CC licences allows the distibution of copyrighted works
NOT EXPECTED: people can actually make money from CC licenced works (CC licences not a panacea – what makes this work)
What is it that allows people to make money from CC licenced works??
Think of licence components as switches – turn some of them on and off(e.g. remixing, commercial expolitation etc….) how can these be used most productively
Need structured research into how these tools can be used

Science is different from culture
CC knows to little about science to map CC ont science
Reasons for starting Science Commons –> illustrated by the development of the WWW by a SCIENTISTS
WWW works better for porn, sales etc than for science
most of the into on the WWW is wrong –>idiots…
why does resarch on the web still work?
search engines bring us the place where knowledgeable people think there is value e.g. amount of linking
THE VALUE IS IN THE METADATA – it’s the second layer of linkage that makes the web useful
That second layer of lnkage is not available for science….becuase scientific knowledge is bound up behind firewalls…we cannot get the second layer of linkage done
other example: get from a revoew f the book to a book on Amazon
same doesn’t work for science: e.g. no linking from the components of an xperimental to the compnents….
BARRIERS to scientific innovation on the web have not been thought through….we can make science work better on the internet
“It is a crime how our government funders etc have failed to understand how innovation depends on infrastructure”
“The network is not the cable along which innovation travels – the network is the cables along which the innovation travels”
Is it good to publicly fund research and then lock it down behind firewalls
Is it good to not fund infrastructure?
We need experimentation. Suggestions:
Admit that we know less than we thought. DO EXPERIMENTS WITH OPEN INNOVATION.
Academia has been much less innovative to make its materials open than many commercial entities, particularly the media.
Research: we need to move away from well-founded assumptions to solid research about what works and what doesn’t.

Reblog this post [with Zemanta]