Archive for category ChemSpider Chemistry
Today is my last day of employment for the Royal Society of Chemistry. It will be almost six years since I joined RSC when ChemSpider was acquired. While ChemSpider was initially a “hobby project” and attempt to create a disruption in terms of access to chemistry data, crowdsourced contribution and data validation, it has gone from strength to strength and now serves ca. 40,000 unique users a day from around the world. It won three awards in the first few months that we joined RSC and was catalytic in RSC winning three grants to allow us to participate in the Open PHACTS project, the PharmaSea project and become the host of the UK National Chemical Database Service. Based on the feedback I have received over the years ChemSpider is much-loved and appreciated as a contribution to the scientific community and is recognized as one of the key players in the free chemistry resources arena. I am proud to have been associated with it.
We also got to set up the ChemSpider SyntheticPages micropublishing site and tried to get the community sharing syntheses that would likely not make it into mainstream papers but were still of value to science.
During my six years at RSC I have been involved with many discussions regarding the following areas of work, study and research and how they would benefit publishing, the society and, of course, the chemistry community at large. The list includes, in particularly random order:
- Chemistry databases – both commercial and free- and how to best mesh, commercialize and license data
- Data quality in publications and databases and development of tools for data validation
- Open Data, Open Access and Open Notebook Science
- Text-mining of the RSC archive to extract & mark up compounds, reactions, property data and analytical data.
- The potential of semantic web applications to scientific publishing
- Encouraging the use of Open Identifiers – especially ORCID and InChI
- The future of Micropublishing in the chemical sciences
- Analytical data and building an open spectral database for the community
- Social networking approaches to build online profiles – especially for young scientists
There are many, many more things of course but these are the big ones and, for me, bring clarity to what my interests are – chemistry data and making it available to the appropriate communities. It is with this in mind that I am excited to join the Environmental Protection Agency next week in the National Center of Computational Toxicology.
With every move forward into a new job we leave behind our old one. And I leave RSC with some sadness that I am leaving and excitement for the new opportunities. I have had the chance to work with so many good people at RSC, to engage with collaborators such as ACD/Labs, Mestre, NextMove, EBI, ChemAxon, Accelrys (as they were then), iChemLabs, Dotmatics and on and on. Apologies if you are not named but the list is very long. Thanks to everyone for your support, encouragement and opportunities to engage. It has been a blast.
And for everyone at RSC who catered to my strange diet of potatoes only…so long, and thanks for all the spuds.
This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CHED Division symposium
Providing Access to a Million NMR Spectra via the web
Antony Williams, Alexey Pshenichnov, Peter Corbett, Daniel Lowe, Carlos Coba
Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s Learn Chemistry. These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.
This is a poster I presented at Pittcon on Wednesday March 11th, 2015
ChemSpider – building an online database of open spectra
ChemSpider is an online database of over 30 million chemical compounds sourced from over 500 different sources including government laboratories, chemical vendors, public resources and publications. Developed with the intention of building community for chemists ChemSpider allows its users to deposit data including structures, properties, links to external resources and various forms of spectral data. Over the past few years ChemSpider has aggregated almost 20000 high quality NMR and IR spectra and continues to expand as the community deposits additional types of data. The majority of spectral data is licensed as Open Data allowing it to be downloaded and reused in presentations, lesson plans and for teaching purposes. This poster will present our existing technology and our plans to host a million spectra in our developing online data repository.
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry #ACSsanfran
This is my presentation at the ACS San Francisco Fall Meeting on August 10th 2014
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry
The Royal Society of Chemistry hosts a growing collection of online chemistry content. For much of our work the InChI identifier is an important component underpinning our projects. This enables the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a platform encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.
This is the first presentation I gave at the ACS meeting in San Francisco on Sunday morning (August 8th) in the CINF Natural Products session.
Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSea project
The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.
Data Mining Dissertations and Adventures and Experiences in the World of Chemistry
This presentation was given at the CLIR/DLF Postdoctoral Fellowship Summer Seminar at Bryn Mawr college in Pennsylvania on July 29th 2014. The intention was to communicate what we are doing in the fields of text and data mining in the domain of chemistry and specifically around mining the RSC archive publication and chemistry dissertations and theses. How would these experiences map over to the humanities?
I presented at the Food and Drug Administration today regarding some of our efforts to develop a research data repository for the community. The abstract and presentation from Slideshare is below.
Current Initiatives in Developing Research Data Repositories at the Royal Society of Chemistry
Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. RSC hosts a number of chemistry data resources for the community including ChemSpider, one of the community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this cheminformatics platform and the nature of the solutions that it helps to enable including structure validation and text mining and semantic markup. ChemSpider is limited in scope as a chemical compound database and we are presently architecting the RSC Data Repository, a platform that will enable us to extend our reach to include chemical reactions, analytical data, and diverse data depositions from chemists across various domains. We will also discuss the possibilities it offers in terms of supporting data modeling and sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community.
Today I gave a presentation at the 2014 Allen Press Emerging Trends in Scholarly Publishing™ Seminar here in Washington DC. Over coffee I had very positive feedback about what we are doing at RSC and various comments about “real science exposed by a publisher”. The abstract and Slideshare presentation are below.
The Application of Text and Data Mining to Enhance the Royal Society of Chemistry Publication Archive
The Royal Society of Chemistry (RSC) is one of the world’s most prominent scientific societies and STM publishers. Our contributions to the scientific community include the delivery of a myriad of resources to support the chemistry community to access chemistry-related data, information and knowledge. This includes ChemSpider, a compound centric platform linking together over 30 million chemical compounds with internet-based resources. Using this compound database and its associated chemical identifiers as a basis the RSC is utilizing text and data mining approaches to data enable our published archive of scientific publications. This presentation will provide an overview of our technical approaches to text and data enable our archive of scientific articles, how we are developing an integrated database of chemical compounds, reactions, physical and analytical data and how it will be used to facilitate scientific discovery.
This is a presentation I gave at the ACS Dallas meeting on March 19th 2014
Accessing Royal Society of Chemistry resources and making chemistry mobile
The ongoing drive towards mobile devices is now simply one of generic ubiquity. It is less an issue of whether a scientist has a mobile device but rather what brand, what generation and what apps do they have installed. Chemistry has fast been moving to mobile devices for a number of years now and today is it possible to draw chemical compounds, perform searches of databases both on device and in the cloud. Modeling of data using server based platforms is increasing in scope and capabilities. The Royal Society of Chemistry was early in recognizing the potential power of mobile platforms in terms to allowing scientists to access data and the benefits of such devices to allowing students access to data and content. This presentation will provide an overview of our efforts to date in supporting chemistry technologies on mobile devices and our recent developments in this domain.
This is a presentation I gave at the ACS Dallas meeting on March 19th 2014
Data enhancing the Royal Society of Chemistry publication archive
The Royal Society of Chemistry has an archive of hundreds of thousands of published articles containing various types of chemistry related data – compounds, reactions, property data, spectral data etc. RSC has a vision of extracting as much of these data as possible and providing access via ChemSpider and its related projects. To this end we have applied a combination of text-mining extraction, image conversion and chemical validation and standardization approaches. The outcome of this project will result in new chemistry related data being added to our chemical and reaction databases and in the ability to more tightly couple web-based versions of the articles with these extracted data. The ability to search across the archive will be enhanced as a result. This presentation will report on our progress in this data extraction project and discuss how we will ultimately use similar approaches in our publishing pipeline to enhance article markup for new publications.