Archive for category ChemSpider Chemistry

How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry #ACSsanfran

This is my presentation at the ACS San Francisco Fall Meeting on August 10th 2014

How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry

The Royal Society of Chemistry hosts a growing collection of online chemistry content. For much of our work the InChI identifier is an important component underpinning our projects. This enables the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a platform encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.

No Comments

Applying RSC cheminformatics skills to support the PharmaSea project at #ACSsanfran

This is the first presentation I gave at the ACS meeting in San Francisco on Sunday morning (August 8th) in the CINF Natural Products session.

Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSea project

The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.

No Comments

Data Mining Dissertations and Adventures and Experiences in the World of Chemistry

Data Mining Dissertations and Adventures and Experiences in the World of Chemistry

This presentation was given at the CLIR/DLF Postdoctoral Fellowship Summer Seminar at Bryn Mawr college in Pennsylvania on July 29th 2014. The intention was to communicate what we are doing in the fields of text and data mining in the domain of chemistry and specifically around mining the RSC archive publication and chemistry dissertations and theses. How would these experiences map over to the humanities?

,

2 Comments

Current Initiatives in Developing Research Data Repositories at the Royal Society of Chemistry

I presented at the Food and Drug Administration today regarding some of our efforts to develop a research data repository for the community. The abstract and presentation from Slideshare is below.

Current Initiatives in Developing Research Data Repositories at the Royal Society of Chemistry

Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. RSC hosts a number of chemistry data resources for the community including ChemSpider, one of the community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this cheminformatics platform and the nature of the solutions that it helps to enable including structure validation and text mining and semantic markup. ChemSpider is limited in scope as a chemical compound database and we are presently architecting the RSC Data Repository, a platform that will enable us to extend our reach to include chemical reactions, analytical data, and diverse data depositions from chemists across various domains. We will also discuss the possibilities it offers in terms of supporting data modeling and sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community.

No Comments

Presentation at the 2014 Allen Press Emerging Trends in Scholarly Publishin Seminar

Today I gave a presentation at the 2014 Allen Press Emerging Trends in Scholarly Publishing™ Seminar here in Washington DC. Over coffee I had very positive feedback about what we are doing at RSC and various comments about “real science exposed by a publisher”. The abstract and Slideshare presentation are below.

The Application of Text and Data Mining to Enhance the Royal Society of Chemistry Publication Archive

The Royal Society of Chemistry (RSC) is one of the world’s most prominent scientific societies and STM publishers. Our contributions to the scientific community include the delivery of a myriad of resources to support the chemistry community to access chemistry-related data, information and knowledge. This includes ChemSpider, a compound centric platform linking together over 30 million chemical compounds with internet-based resources. Using this compound database and its associated chemical identifiers as a basis the RSC is utilizing text and data mining approaches to data enable our published archive of scientific publications. This presentation will provide an overview of our technical approaches to text and data enable our archive of scientific articles, how we are developing an integrated database of chemical compounds, reactions, physical and analytical data and how it will be used to facilitate scientific discovery.

No Comments

Accessing Royal Society of Chemistry resources and making chemistry mobile

This is a presentation I gave at the ACS Dallas meeting on March 19th 2014

Accessing Royal Society of Chemistry resources and making chemistry mobile

The ongoing drive towards mobile devices is now simply one of generic ubiquity. It is less an issue of whether a scientist has a mobile device but rather what brand, what generation and what apps do they have installed. Chemistry has fast been moving to mobile devices for a number of years now and today is it possible to draw chemical compounds, perform searches of databases both on device and in the cloud. Modeling of data using server based platforms is increasing in scope and capabilities. The Royal Society of Chemistry was early in recognizing the potential power of mobile platforms in terms to allowing scientists to access data and the benefits of such devices to allowing students access to data and content. This presentation will provide an overview of our efforts to date in supporting chemistry technologies on mobile devices and our recent developments in this domain.

 

No Comments

Data enhancing the Royal Society of Chemistry publication archive

This is a presentation I gave at the ACS Dallas meeting on March 19th 2014

Data enhancing the Royal Society of Chemistry publication archive

The Royal Society of Chemistry has an archive of hundreds of thousands of published articles containing various types of chemistry related data – compounds, reactions, property data, spectral data etc. RSC has a vision of extracting as much of these data as possible and providing access via ChemSpider and its related projects. To this end we have applied a combination of text-mining extraction, image conversion and chemical validation and standardization approaches. The outcome of this project will result in new chemistry related data being added to our chemical and reaction databases and in the ability to more tightly couple web-based versions of the articles with these extracted data. The ability to search across the archive will be enhanced as a result. This presentation will report on our progress in this data extraction project and discuss how we will ultimately use similar approaches in our publishing pipeline to enhance article markup for new publications.

 

No Comments

The UK National Chemical Database Service as an integration of commercial and public chemistry services to support chemists in the United Kingdom

This is a presentation I gave at the ACS National Meeting in Dallas on Wednesday 19th March 2014

The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom

At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.

 

No Comments

Ontology work at the Royal Society of Chemistry #ACSDallas

This is a presentation at I gave at the ACS Spring meeting in Dallas, Texas on March 17th 2014

Ontology work at the Royal Society of Chemistry

We provide an overview of the use we make of ontologies at the Royal Society of Chemistry.  Our engagement with the ontology community began in 2006 with preparations for Project Prospect, which used ChEBI and other Open Biomedical Ontologies to mark up journal articles. Subsequently Project Prospect has evolved into DERA (Digitally Enhancing the RSC Archive) and we have developed further ontologies for text markup, covering analytical methods and name reactions. Most recently we have been contributing to CHEMINF, an open-source cheminformatics ontology, as part of our work on disseminating calculated physicochemical properties of molecules via the Open PHACTS. We show how we represent these properties and how it can serve as a template for disseminating different sorts of chemical information.

 

No Comments

Big data challenges associated with building a national data repository for chemistry

I gave a presentation at the ICIC 2013 meeting in Vienna focused on the “Big data challenges associated with building a national data repository for chemistry“. The Slideshare presentation is shown below.

At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types ssociated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.

No Comments