Last day at the Royal Society of Chemistry – So long and thanks for all the spuds

Today is my last day of employment for the Royal Society of Chemistry. It will be almost six years since I joined RSC when ChemSpider was acquired. While ChemSpider was initially a “hobby project” and attempt to create a disruption in terms of access to chemistry data, crowdsourced contribution and data validation, it has gone from strength to strength and now serves ca. 40,000 unique users a day from around the world. It won three awards in the first few months that we joined RSC and was catalytic in RSC winning three grants to allow us to participate in the Open PHACTS project, the PharmaSea project and become the host of the UK National Chemical Database Service. Based on the feedback I have received over the years ChemSpider is much-loved and appreciated as a contribution to the scientific community and is recognized as one of the key players in the free chemistry resources arena. I am proud to have been associated with it.

We also got to set up the ChemSpider SyntheticPages micropublishing site and tried to get the community sharing syntheses that would likely not make it into mainstream papers but were still of value to science.

During my six years at RSC I have been involved with many discussions regarding the following areas of work, study and research and how they would benefit publishing, the society and, of course, the chemistry community at large. The list includes, in particularly random order:

  • Chemistry databases – both commercial and free- and how to best mesh, commercialize and license data
  • Data quality in publications and databases and development of tools for data validation
  • Open Data, Open Access and Open Notebook Science
  • Text-mining of the RSC archive to extract & mark up compounds, reactions, property data and analytical data.
  • The potential of semantic web applications to scientific publishing
  • Encouraging the use of Open Identifiers – especially ORCID and InChI
  • The future of Micropublishing in the chemical sciences
  • Analytical data and building an open spectral database for the community
  • Social networking approaches to build online profiles – especially for young scientists

There are many, many more things of course but these are the big ones and, for me, bring clarity to what my interests are – chemistry data and making it available to the appropriate communities. It is with this in mind that I am excited to join the Environmental Protection Agency next week in the National Center of Computational Toxicology.

With every move forward into a new job we leave behind our old one. And I leave RSC with some sadness that I am leaving and excitement for the new opportunities. I have had the chance to work with so many good people at RSC, to engage with collaborators such as ACD/Labs, Mestre, NextMove, EBI, ChemAxon, Accelrys (as they were then), iChemLabs, Dotmatics and on and on. Apologies if you are not named but the list is very long. Thanks to everyone for your support, encouragement and opportunities to engage. It has been a blast.

And for everyone at RSC who catered to my strange diet of potatoes only…so long, and thanks for all the spuds.

Micropublishing of 200 words isn’t new but the Journal of Brief Ideas is

Nature recently posted about a Journal that Publishes 200 Word Articles. The reporter commented “it is the latest online journal promises to bring a little brevity to science by accepting submissions of 200 words or less”. Initially I thought it was a Nature experiment but it isn’t. The intention around this new Journal of Brief Ideas is outlined here :

Some of the comments on the Nature post are interesting. This one from Bob Buntrock, who I know well from the Chemical Information list server probably represents a large number of people:

“200 words is not even a good abstract in most cases. Sop to the Twitter crowd. Since I do not nor plan to use social media for scientific communication, I’ll never use it and I’ll tend not to respect it.”

Personally, I BELIEVE in micropublishing. That’s why when I joined RSC over 5 years ago and we unveiled ChemSpider at our first conference in Glasgow the NEW idea that Valery Tkachenko and I pitched was to take advantage of our knowledge of cheminformatics, chemical data handling in ChemSpider and the increasing activities in blogging and microblogging and apply them to something called “ChemSpider Syntheses”. The ChemSpider Journal of Chemistry had been run as an experiment already, and is still online. We had already shown that Open Access articles such as those from MDPI Molecules could be hosted in the ChemMantis platform and marked up with interactive chemical widgets. We were already aware of the great work done by the SyntheticPages group and we chose to collaborate to create ChemSpider SyntheticPages (CSSP) as announced here.

Since then CSSP has accepted many articles and became the host of all of the Olympicene synthetic steps. The story of Olympicene is in this YouTube video and the list of synthetic steps is here. Peter Scott has told his story about CSSP and submissions have continued.

I took a look at some of these articles and if I exclude the Title, data such as NMR list of shifts and Chemicals Used then MANY ChemSpider SyntheticPages articles are about 200-250 words (i.e. the Procedure and the Authors Comments). All articles submitted to CSSP go through a fairly light review process from one of the editorial team, generally in about 24 hours, then are published and the community can comment on them – open peer review.

I also believe in the possibilities associated with Nanopublishing and nanopublications and there is work afoot to unveil some of these from text-mining efforts.

While our micropublishing efforts are focused on chemistry and syntheses specifically I believe there are other opportunities. Certainly Figshare, Slideshare and Dryad can all host micropublications already. The efforts of the Journal of Brief Ideas is a new approach and an experiment worth watching!  Good luck to them!

A chemistry data repository to serve them all

A presentation that I am giving around UK universities in September/October 2014

A chemistry data repository to serve them all

Over the past five years the Royal Society of Chemistry has become world renowned for its public domain compound database that integrates chemical structures with online resources and available data. ChemSpider regularly serves over 50,000 users per day who are seeking chemistry related data. In parallel we have used ChemSpider and available software services to underpin a number of grant-based projects that we have been involved with: Open PHACTS – a semantic web project integrating chemistry and biology data, PharmaSea – seeking out new natural products from the ocean and the National Chemical Database Service for the United Kingdom. We are presently developing a new architecture that will offer broader scope in terms of the types of chemistry data that can be hosted. This presentation will provide an overview of our Cheminformatics activities at RSC, the development of a new architecture for a data repository that will underpin a global chemistry network, and the challenges ahead, as well as our activities in releasing software and data to the chemistry community.

Dealing with the Complex Challenge of Managing Diverse Chemistry Data Online to Enable Chemistry Across the World #ACSsanfran

This is my third presentation today at the ACS meeting in San Francisco on 11th August 2014

Dealing with the Complex Challenge of Managing Diverse Chemistry Data Online to Enable Chemistry Across the World

The Royal Society of Chemistry has provided access to data associated with millions of chemical compounds via our ChemSpider database for over 5 years. During this period the richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process of implementing a new architecture to build a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on how our efforts to manage chemistry related data has impacted chemists and projects across the world and will review specifically our contributions to projects involving natural products for collaborators in Brazil and China, for the Open Source Drug Discovery project in India, and our collaborations with scientists in Russia.


Encouraging students to start publishing early in their career #ACSsanfran

My second talk of three on August 11th 2014 at the ACS Meeting in San Francisco.

Encouraging students to start publishing early in their career

Many students spend enormous amounts of their time engaged with their computers, accepting of course that mobile devices are simply computers of a different form factor. Engaged with the social networks, utilizing computer platforms to source and share content of various forms, their contributions of “data” into what is the cloud, and in many cases a void, is enormous. What community and career benefit might result from those students spending some of their time contributing chemistry related data to the world? What challenges lie in the way of their participation and how might participating have a positive, or negative impact on their future career. The Royal Society of Chemistry hosts a number of chemistry data platforms to which students can actively contribute and for which their participation can be measured. Moreover the RSC’s micropublishing platform allows chemists to learn how to write up their scientific work, obtain review from their peers and chemistry professors in a non-threatening environment and produce an online published work in less than day that is both citable and available as a shared resource for the community. This presentation will demonstrate how to participate and encourage engagement from students early in their education. There are no longer any technology barriers to the sharing of the majority of chemistry related data.


Royal Society of Chemistry Activities to Develop a Data Repository for Chemistry-Specific Data

This is a presentation I have at the ACS Meeting in Dallas, Texas on March 17th 2014

Royal Society of Chemistry Activities to Develop a Data Repository for Chemistry-Specific Data 

In recent years the Royal Society of Chemistry has become known for our development of freely accessible data platforms including ChemSpider, ChemSpider Reactions and our new chemistry data repository. In order to support drug discovery RSC participates in a number of projects including the Open PHACTS semantic web project, the PharmaSea natural products discovery project and the Open Source Drug Discovery project in collaboration with a team in India. Our most recent developments include extending our efforts to support neglected diseases by the provision of high quality datasets resulting from our curation efforts to support modeling, the delivery of enhanced application programming interfaces to allow open source drug discovery teams to both source and deposit data from our chemistry databases and the provision of a micropublishing platform to report on various aspects of work supporting neglected disease drug discovery. This presentation will review our existing efforts and our plans for extended development.

Ontology work at the Royal Society of Chemistry #ACSDallas

This is a presentation at I gave at the ACS Spring meeting in Dallas, Texas on March 17th 2014

Ontology work at the Royal Society of Chemistry

We provide an overview of the use we make of ontologies at the Royal Society of Chemistry.  Our engagement with the ontology community began in 2006 with preparations for Project Prospect, which used ChEBI and other Open Biomedical Ontologies to mark up journal articles. Subsequently Project Prospect has evolved into DERA (Digitally Enhancing the RSC Archive) and we have developed further ontologies for text markup, covering analytical methods and name reactions. Most recently we have been contributing to CHEMINF, an open-source cheminformatics ontology, as part of our work on disseminating calculated physicochemical properties of molecules via the Open PHACTS. We show how we represent these properties and how it can serve as a template for disseminating different sorts of chemical information.


The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms

This is my seventh and LAST talk at the ACS Meeting in Indianapolis:

The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms

The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.

Digitizing documents to provide a public spectroscopy database

This is my sixth presentation at the ACS Fall Meeting in Indianapolis:

Digitizing documents to provide a public spectroscopy database

RSC hosts a number of platforms providing free access to chemistry related data. The content includes chemical compounds and associated experimental and predicted data, chemical reactions and, increasingly, spectral data. The ChemSpider database primarily contains electronic spectral data generated at the instrument, converted into standard formats such as JCAMP, then uploaded for the community to access. As a publisher RSC holds a rich source of spectral data within our scientific publications and associated electronic supplementary information. We have undertaken a project to Digitally Enable the RSC Archive (DERA) and as part of this project are converting figures of spectral data into standard spectral data formats for storage in our ChemSpider database. This presentation will report on our progress in the project and some of the challenges we have faced to date.



The Social Profile of a Chemist Online

This is my fifth talk at the ACS Indianapolis Conference:

The Social Profile of a Chemist Online – The Potential Profits of Participation

Unless a scientist is limited by their employer from exposing their scientific activities through publications and presentations, their future impact, whether expected to be at a bench, in front of an instrument or surrounded by robotics, will largely be represented online through their published works, their citation profile and other forms of recognition of their work by their peers. Search engines are already harvesting information about a scientist and aggregating into profiles such as those offered by Google Scholar Citations and Microsoft Academic Search. Rather than be limited to the online representation provided by such services students are encouraged to participate in the creation of their online profile and architect the representation of themselves online to as large a degree as possible to represent themselves to future employers and collaborators. This presentation will give an overview of potential approaches to participating in development of their online persona.



