Presentation given at ACS New Orleans Spring Meeting
Chemistry online is represented in various ways including publications, presentations, blog posts, wiki-contributions, data depositions, curations and annotations. Encouraging participation from the community to participate in and comment on the information delivered via these various formats would likely provide for a rich dialog exchange in some cases and improved data quality in others. At the Royal Society of Chemistry we have a number of platforms that are amenable to contribution. This presentation will provide an overview of our experiences in engaging the community to interact with our various forms of content and discuss new approaches we are utilizing to encourage crowdsourced participation.
Today I was hosted by Prof David Snyder at William Paterson University in New Jersey where I gave a presentation with the title and abstract below. I had a wonderful day meeting with various members of the faculty and discussing the opportunities for AltMetrics and the need for young scientists to consider their public profile EARLY in their career and how to expose themselves on the Social Network (old presentation here). The Slideshare presentation for today is loaded on SlideShare here.
RSC|ChemSpider – The Online Chemistry Database Where Community Contributions Count
The ChemSpider database is a resource hosted by the Royal Society of Chemistry. With over 28 million unique chemicals on the database linked out to over 400 data sources the platform provides access to experimental and predicted data (properties, spectra etc.), links to publications, patents and a myriad of other resources. The ChemSpider database has been used as the foundation of a number of other resources for chemists including ChemSpider SyntheticPages, the Learn Chemistry Wiki and the Spectral Game. This presentation will provide an overview of ChemSpider and discuss how chemists can both derive value from and contribute to the content available from the database and its related resources. We will also discuss our view of future platform for managing personal, institutional and public chemistry in a shared environment.
Last night in Chicago I was awarded the Jim Gray eScience Award. I didn’t know Jim personally but I know I benefit from the fruits of his work. Before Tony Hey gave me the award he played a video about the previous award winners. To be recognized for my contributions and to join scientists of the caliber of the previous winners was, to say the least, very emotional. My entire career has been focused on doing what I thought was the right thing for the role I was charged with. And when I didn’t want the role I was in I would move on. That’s migrated me through various roles in science from lab manager in academia, in industry, to start-up cheminformatics company product manager, through marketing, through sales, to community website for chemistry, to where I am today at RSC, a publisher. If I had been asked to map out my career path there is no way I would get to here…but which of us would be able to really?
Last night I presented on “The Possibilities and Pitfalls of Internet-Based Chemical Data”. I talked about how much data I have generated in the lab over the years that is now lost. And how we can change this moving forward for the existing generation of scientists. I talked about the history of ChemSpider from hobby-project to present day as one of the web’s primary sites for chemists. I talked about how scientists should PARTICIPATE in annotating and curating data online…how data sites specifically should enable commenting to capture issues. I talked about the measure of scientists and how efforts including ORCID and ImpactStory will be important to deal with the impact and notability of scientists. I hope I was able to share my view that while technology will continue to improve in terms of allowing us to contribute that it is personal choice to make a difference that is crucial in terms of correcting errors, annotating data and continuing the journey of creating improved resources for the chemistry community (and of course other branches of science).
I also announced our intention for RSC to create a Global Chemistry Hub (a topic for a separate post) and to “data enable the RSC archive”…extracting chemicals, reactions, data etc from our archive going back to the 1840s. We do not have all of the technologies, the processes or the approaches yet defined. But we have the intent and the courage to go for it, learning as we go and producing beneficial outcomes in an iterative manner. It’s an exciting time for the RSC cheminformatics team and it is my privilege to work alongside a great team of individuals to create a step change in terms of how we manage and deliver chemistry data to the community.
I have had a lot of trusted advisers over the years and last night I acknowledged a list of those closest to me in recent years. They include: Jean-Claude Bradley, Sean Ekins, Lee Harland, Gary Martin and Martin Walker. The closest to me however is Valery Tkachenko. I was happy that Valery was able to be at the conference with me. So much of what has been achieved to data with ChemSpider (as well as MANY projects we worked on together while at ACD/Labs) rests squarely on his shoulders. The future technical implementation of the cheminformatics projects we are undertaking at RSC is under his guiding hand. I am glad to have such a great “partner in crime”….
My thanks to Microsoft Research, to the judges for selecting me for the award and to the community who has chosen to embrace some of the fruits of my work. I am leaving Chicago proud, tired and looking forward to making an ever bigger impact with some of our new projects.
The story of Olympicene, and our intention to try and get it synthesized and analyzed, was first reported in August 2011 here. The original conversation was between Prof Graham Richards and I over a drink in Belgium at the RSC Editors Symposium in March 2010. The concept of having someone synthesize a small organic molecule that would be a molecular representation of a famous symbol of sport was a fascinating challenge. And, always one for a challenge, it was one that was pursued with great gusto!
Since we had started the ChemSpider SyntheticPages (CSSP) platform recently I thought it was appropriate to kick off a grand vision discussion with Peter Scott, one of the editors of CSSP. My original idea that I bounced off of Peter was a big one…an international competition exposed to the chemistry community. Encourage chemistry labs around the world to submit their step-by-step syntheses to CSSP. We would be able to collect and expose all of this work to the entire chemistry community. We would set up a voting scheme for the community to give their input on what was the most elegant synthesis, the greenest, what had the best analytical data, what had the best write up. Not all categories were detailed at that time and would come later but the concept of bronze, silver and gold medal winners in an international chemistry competition made sense. We were really excited by the possibilities but for many reasons (read that as many distractions) we rolled the announcement out as a smaller announcement and encouraged participation as best as we could with a small engagement profile via this blog. It did seem to garner a lot of attention but as is common with such projects the participation was not as high as we expected. Nevertheless one lab did step up to participate in the project, the lab of David Fox Group at the University of Warwick. David is a colleague of Peter Scott’s…small world…
David had one of his students pursue the synthesis, not only because the olympicene molecule might be an elegant piece of synthetic work, but also because some of the envisaged properties could well be of value (more on that later!). Anish started publishing his syntheses to CSSP in November of last year as listed here. You can see the Olympicene compound coming together step by step and yes, the final step is not yet reported! Once the compound was made then the possibilities of having it analyzed seemed rather interesting, especially having seen the work reported by IBM in 2009 regarding the single molecule imaging of pentacene. Also, I had followed the work of Marcel Jaspars, who I had known during my time working at ACD/Labs when I was working on Computer-Assisted Structure Elucidation [1,2]. Marcel had recently worked on an NMR and microscopy imaging project to confirm a chemical compound structure. Again, small world. I asked Marcel for an intro to the researchers at IBM and we started a dialogue. Researchers at University of Warwick had already applied Scanning Tunnelling Microscopy (Dr Giovanni Costantini and Ben Moreton at Warwick) and they then connected with Leo Gross with the idea of using the noncontact atomic force microscopy approach.
Within a fairly short period of time IBM had performed the very elegant work of imaging olympicene…just one of the images is shown below but there are others shown on the Flickr account.
A single olympicene molecule is just 1.2 nanometres in width, about 100,000 times thinner than a human hair. This is beautiful! For whatever reason it looks like a molecule with a smile at the success of the work too!
The story of the work is described in this video below.
The work is not over yet! There is a research paper to come from the University of Warwick and IBM Research labs as there is definitely unique science that has come out of this work and definitely needs to be reported. That molecule, as it were, is “NOT just a pretty face”. We will submit all the appropriate images and available analytical data onto ChemSpider and CSSP as time allows.
For now I simply smile at the story of a concept discussion between Graham and I that was taken into the hands of superb scientists and brought to fruition. Congratulations to ALL of those who worked on the project in David Fox’s and Leo Gross’s labs. Thanks to the marketing people at IBM, RSC and Warwick for bringing together all of the materials in a tight time frame to tell the story. My thanks to my colleagues at RSC who believed in the potential of this project and especially to Peter Scott for seeing the potential and willingly participating! This project is a great example of international collaboration and pushing science to its extremes. It was a pleasure to be involved if only at a concept level and HOPEFULLY I will get to meet the scientists who did the work sometime!
When writing talks I try to find interesting (and where possible fun) examples of how challenging the world of managing chemistry data is for all of us that work in the world of managing 10s of thousands, or in our cases millions of compound pages for the community to use. I have told many stories over the past few years of the challenges we collectively have in regards to data quality and how it flows between our databases unabated. My latest example used at the recent talk at the EBI (ChemSpider – An Online Database and Registration System Linking the Web) was the structure known as Terminal Dimethyl presently on PubChem, DrugBank, Wolfram Alpha and PDBe. It was originally inherited into ChemSpider also but has been deprecated. I left a comment on DrugBank a couple of weeks ago but it hasn’t been published yet…generally such errors are removed VERY quickly by the DrugBank hosts. I added a comment to Wolfram Alpha and received a canned response and no changes to the record as yet.
There ARE ways to communally resolve these issues and I will blog about that shortly.
I am presently in Barcelona at the ICIC meeting to give a presentation entitled “Mobile Chemistry and “Generation App”. I have been preparing by looking at what is new in the world of Chemistry Apps and in the process have updated my ongoing list of apps and updated it on SlideShare. I intend to keep updating it every couple of months to keep track of new apps as they become available. I have not had time to update the SciMobileApps wiki as yet.
The internet is a rich source of chemistry related data and, nowadays, if a chemist knows how to initiate a search, data can be sourced for millions of chemicals online. The nature of online data varies from simple molecule diagrams, to experimental and predicted properties, encyclopedic articles, synthetic routes, analytical data, patents and publications. The array of information now accessible is distributed across thousands of sites giving rise to the information overload commonly associated with the Google-type searches on the internet. In addition the purest language of chemistry, that of chemical structures, is not fully supported on the web as yet. This presentation will provide an overview of how the internet is being meshed together using data aggregation and standardization approaches to enable a structure-searchable internet for chemistry. The speaker will present an overview of the ChemSpider platform (http://www.chemspider.com), the challenges of linking together over 400 internet resources and 26 million unique chemicals, and discuss how members of the chemistry community can directly contribute to enhancing the availability of quality data online.
This is a movie of the talk I gave using the BigBlueButton platform to students and faculty at the University of Arkansas, Little Rock.
I had the pleasure of co-presenting with my friend Jean-Claude Bradley today at the “3rd Annual Drug Discovery Partnership: Filling the Pipeline“. Jean-Claude gave a great talk, available on Slideshare here, and discussed the issue of data quality, how improve data gives improved models, the cross-validation of data and proliferation of errors. My talk is on Slideshare here and embedded below. In many ways I discussed similar issues, though not focused on melting point data but rather on structures, structure-identifier relationships, the cross-linking of multiple resources on the internet and how online resources can support Open Drug Discovery Systems. In this presentation I discussed some of the work we are doing on Open PHACTS.
My final presentation at ACS Denver yesterday I think was the clearest presentation I gave all week. As with most presentations I gave last week I was up at 4am to finish it off based on conversations I had been having during the week. A lot of people came to the booth after the presentation to acknowledge that they had been dealing with such challenges for years and that it was time that a drug collection was finally available. It took months to get 152 drugs “right”. It would take a looong time to reproduce something of the quality of Merck Index!
“Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs
Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins, assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.”
This is my presentation at the Skolnik Symposium at ACS Denver to honor the contributions of Alexander “Sandy” Lawson to our domain of Cheminformatics.
ChemSpider – Does Community Engagement work to Build a Quality Online Resource for Chemists?
With an intention to provide a high quality free internet resource of chemistry related data for the community, ChemSpider has aggregated almost 25 million compounds linked out to over 400 data sources and provided a platform for the community to both deposit and curate data. This experiment in crowdsourcing for chemistry has now been running for over three years. This presentation will review a number of aspects of the project including (a) the level of community participation in depositing and curating data; (b) the nature of data and content supplied by the community; (c) how ChemSpider is used by the community; (d) using game-based systems to assist in data curation; (e) algorithmic-based approaches to data validation and filtering; and (f) sharing data curation efforts with other online databases.