Archive for category ChemSpider Chemistry
This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CHED Division symposium
Providing Access to a Million NMR Spectra via the web
Antony Williams, Alexey Pshenichnov, Peter Corbett, Daniel Lowe, Carlos Coba
Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s Learn Chemistry. These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.
This is a poster I presented at Pittcon on Wednesday March 11th, 2015
ChemSpider – building an online database of open spectra
ChemSpider is an online database of over 30 million chemical compounds sourced from over 500 different sources including government laboratories, chemical vendors, public resources and publications. Developed with the intention of building community for chemists ChemSpider allows its users to deposit data including structures, properties, links to external resources and various forms of spectral data. Over the past few years ChemSpider has aggregated almost 20000 high quality NMR and IR spectra and continues to expand as the community deposits additional types of data. The majority of spectral data is licensed as Open Data allowing it to be downloaded and reused in presentations, lesson plans and for teaching purposes. This poster will present our existing technology and our plans to host a million spectra in our developing online data repository.
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry #ACSsanfran
This is my presentation at the ACS San Francisco Fall Meeting on August 10th 2014
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry
The Royal Society of Chemistry hosts a growing collection of online chemistry content. For much of our work the InChI identifier is an important component underpinning our projects. This enables the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a platform encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.
This is the first presentation I gave at the ACS meeting in San Francisco on Sunday morning (August 8th) in the CINF Natural Products session.
Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSea project
The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.
Data Mining Dissertations and Adventures and Experiences in the World of Chemistry
This presentation was given at the CLIR/DLF Postdoctoral Fellowship Summer Seminar at Bryn Mawr college in Pennsylvania on July 29th 2014. The intention was to communicate what we are doing in the fields of text and data mining in the domain of chemistry and specifically around mining the RSC archive publication and chemistry dissertations and theses. How would these experiences map over to the humanities?
I presented at the Food and Drug Administration today regarding some of our efforts to develop a research data repository for the community. The abstract and presentation from Slideshare is below.
Current Initiatives in Developing Research Data Repositories at the Royal Society of Chemistry
Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. RSC hosts a number of chemistry data resources for the community including ChemSpider, one of the community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this cheminformatics platform and the nature of the solutions that it helps to enable including structure validation and text mining and semantic markup. ChemSpider is limited in scope as a chemical compound database and we are presently architecting the RSC Data Repository, a platform that will enable us to extend our reach to include chemical reactions, analytical data, and diverse data depositions from chemists across various domains. We will also discuss the possibilities it offers in terms of supporting data modeling and sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community.
Today I gave a presentation at the 2014 Allen Press Emerging Trends in Scholarly Publishing™ Seminar here in Washington DC. Over coffee I had very positive feedback about what we are doing at RSC and various comments about “real science exposed by a publisher”. The abstract and Slideshare presentation are below.
The Application of Text and Data Mining to Enhance the Royal Society of Chemistry Publication Archive
The Royal Society of Chemistry (RSC) is one of the world’s most prominent scientific societies and STM publishers. Our contributions to the scientific community include the delivery of a myriad of resources to support the chemistry community to access chemistry-related data, information and knowledge. This includes ChemSpider, a compound centric platform linking together over 30 million chemical compounds with internet-based resources. Using this compound database and its associated chemical identifiers as a basis the RSC is utilizing text and data mining approaches to data enable our published archive of scientific publications. This presentation will provide an overview of our technical approaches to text and data enable our archive of scientific articles, how we are developing an integrated database of chemical compounds, reactions, physical and analytical data and how it will be used to facilitate scientific discovery.
This is a presentation I gave at the ACS Dallas meeting on March 19th 2014
Accessing Royal Society of Chemistry resources and making chemistry mobile
The ongoing drive towards mobile devices is now simply one of generic ubiquity. It is less an issue of whether a scientist has a mobile device but rather what brand, what generation and what apps do they have installed. Chemistry has fast been moving to mobile devices for a number of years now and today is it possible to draw chemical compounds, perform searches of databases both on device and in the cloud. Modeling of data using server based platforms is increasing in scope and capabilities. The Royal Society of Chemistry was early in recognizing the potential power of mobile platforms in terms to allowing scientists to access data and the benefits of such devices to allowing students access to data and content. This presentation will provide an overview of our efforts to date in supporting chemistry technologies on mobile devices and our recent developments in this domain.
This is a presentation I gave at the ACS Dallas meeting on March 19th 2014
Data enhancing the Royal Society of Chemistry publication archive
The Royal Society of Chemistry has an archive of hundreds of thousands of published articles containing various types of chemistry related data – compounds, reactions, property data, spectral data etc. RSC has a vision of extracting as much of these data as possible and providing access via ChemSpider and its related projects. To this end we have applied a combination of text-mining extraction, image conversion and chemical validation and standardization approaches. The outcome of this project will result in new chemistry related data being added to our chemical and reaction databases and in the ability to more tightly couple web-based versions of the articles with these extracted data. The ability to search across the archive will be enhanced as a result. This presentation will report on our progress in this data extraction project and discuss how we will ultimately use similar approaches in our publishing pipeline to enhance article markup for new publications.
The UK National Chemical Database Service as an integration of commercial and public chemistry services to support chemists in the United Kingdom
This is a presentation I gave at the ACS National Meeting in Dallas on Wednesday 19th March 2014
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.