Using an online database of chemical compounds for the purpose of structure identification

Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.


This is the first presentation I gave at the ACS meeting in San Francisco on Sunday morning (August 8th) in the CINF Natural Products session.

Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSea project

The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.


Today I gave a presentation at the 2014 Allen Press Emerging Trends in Scholarly Publishing™ Seminar here in Washington DC. Over coffee I had very positive feedback about what we are doing at RSC and various comments about “real science exposed by a publisher”. The abstract and Slideshare presentation are below.

The Application of Text and Data Mining to Enhance the Royal Society of Chemistry Publication Archive

The Royal Society of Chemistry (RSC) is one of the world’s most prominent scientific societies and STM publishers. Our contributions to the scientific community include the delivery of a myriad of resources to support the chemistry community to access chemistry-related data, information and knowledge. This includes ChemSpider, a compound centric platform linking together over 30 million chemical compounds with internet-based resources. Using this compound database and its associated chemical identifiers as a basis the RSC is utilizing text and data mining approaches to data enable our published archive of scientific publications. This presentation will provide an overview of our technical approaches to text and data enable our archive of scientific articles, how we are developing an integrated database of chemical compounds, reactions, physical and analytical data and how it will be used to facilitate scientific discovery.


This is my seventh and LAST talk at the ACS Meeting in Indianapolis:

The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms

The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.


Presentation given at ACS New Orleans Spring Meeting

ChemSpider is one of the chemistry community’s primary public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data to many tens of websites and software applications at this point. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of solutions that it helps to enable. We will also discuss some of the future directions for the project that are envisaged and how we intend to continue expanding the impact for the platform.


The involvement of RSC with PharmaSea and a new antibiotics search to focus on the sea bed

A nice article went out today on the BBC News site regarding the work that the PharmaSea project would be undertaking…to find new classes of antibiotics deep in the ocean.


The RSC is involved in the project as a result of our skills in hosting chemicals in a publicly accessible database as well as integrating data. ChemSpider also has a rich collection of natural products already in the database and we are developing approaches to segregate the collection for use by the project. We also have the RSC Natural Product Updates database that we have already integrated with ChemSpider. There are various other aspects of work that we will be doing to support the project including developing approaches to perform “dereplication” – determining whether or not a particular chemical has been previously isolated/identified/elucidated, in this case by searching the ChemSpider database using spectral features (NMR shifts, multiplicities, mass, fragment ions etc). If the actual compound itself is not identified then dereplication approaches can certainly hint at a particular chemical class and substructures. We do NOT have spectral data for the majority of compounds in ChemSpider so spectral prediction approaches will be useful in this regard. We will be working with some very skilled scientists who have experience with the structure elucidation of novel natural products and will have the opportunity to collaborate with ACD/Labs, a company I worked for for over a decade on their Computer-Assisted Structure Elucidation software program, Structure Elucidator, one of the tools that will be used in this project.

Its going to be an exciting project, I am REALLY looking forward to it and heck, if we can help identify new classes of antibiotics we might contribute to some of the challenges we have ahead of us!!!!


Posted by on February 16, 2013 in Nuclear magnetic resonance, PharmaSea, Vision