SWAT4LS2009 – Keynote Alan Ruttenberg: Semantic Web Technology to Support Studying the Relation of HLA Structure Variation to Disease
November 20, 2009 2 Comments
(These are live-blogging notes from Alan’s keynote…so don’t expect any coherent text….use them as bullt points to follow the gist of the argument.)
The Science Commons:
- a project of the Creative Commons
- 6 people
- CC specializes CC to science
- information discovery and re-use
- establish legal clarity around data sharing and encourage automated attribution and provenance
Semantic Web for Biologist because it maximizes value o scientific work by removing repeat experimentation.
ImmPort Semantic Integration Feasibility Project
- Immport is an immunology database and analysis portal
- Goals:metaanalysis
- Question: how can ontology help data integration for data from many sources
Using semantics to help integrate sequence features of HLA with disorders
Challenges:
- Curation of sequence features
- Linking to disorders
- Associating allele sequences with peptide structures with nomenclature with secondary structure with human phenotype etc etc etc…
Talks about elements of representation
- pdb structures translated into ontology-bases respresentations
- canonical MHC molecule instances constructed from IMGT
- relate each residue in pdb to the canonical residue if exists
- use existing ontologies
- contact points between peptide and other chains computed using JMOL following IMGT. Represented as relation between residue instances.
- Structural features have fiat parts
Connecting Allele Names to Disease Names
- use papers as join factors: papers mention both disease and allele – noisy
- use regex and rewrites applied to titles and abstracts to fish out links between diseases and alleles
Correspondence of molecules with allele structures is difficult.
- use blast to fiind closest allele match between pdb and allele sequence
- every pdb and allele residue has URI
- relate matching molecules
- relate each allele residue to the canonical allele
- annotate various residoes with various coordinate systems
This creates massive map that can be navigated and queried. Example queries:
- What autoimmune diseases can de indexed against a given allele?
- What are the variant residues at a position?
- Classification of amino acids
- Show alleles perturned at contacts of 1AGB
Summary of Progress to Date:
Elements of Approach in Place: Structure, Variation, transfer of annotation via alignment, information extraction from literature etc…
Nuts and Bolts:
- Primary source
- Local copy of souce
- Scripts transforms to RDF
- Exports RDF Bundles
- Get selected RDF Bundles and load into triple store
- Parsers generate in memory structures (python, java)
- Template files are instructions to fomat these into owl
- Modeling is iteratively refined by editiing templates
- RDF loaded into Neurocommons, some amount of reasoning
RDFHerd package management for data
neurocommons.org/bundles
Can we reduce the burden of data integration?
- Too many people are doing data integration – wasting effort
- Use web as platform
- Too many ontologies…here’s the social pressure again
Challenges
- have lawyers bless every bit of data integration
- reasoning over triple stores
- SPARQL over HTTP
- Understand and exploit ontology and reasoning
- Grow a software ecosystem like Firefox
![Reblog this post [with Zemanta]](https://i0.wp.com/img.zemanta.com/reblog_e.png)