Another Response to Constructive Feedback from Peter Murray-Rust…

15 Oct

Since ChemSpider went live in March of this year we have received a lot of feedback and questions regarding our understanding of science, our purpose and our passions. We have an excellent Advisory Group who participate in dialogs and constructive discussions. Much of the feedback we have received has been from one individual , Peter Murray-Rust (PMR).

Before proceeding with this post I want to clarify my perceptions. I believe PMR brings a lot of value to the Chemistry Blogosphere. Over the past decade I have watched Peter’s activities with interest as he has participated with many other evangelists to pursue the cause of ODOSOS (Open Data, Open Source and Open Standards). Over the years I will confess a level of hero-worship. I had enjoyed watching what he was doing in regards to enabling the web for chemists. He is prolific..I don’t know where he finds the time to write so much. He travels the world and informs us all of what is going on “out there”. He does a great service. In contrast to these positive traits which I honor I am of the opinion that Peter is overly harsh and judgmental in some cases. Often he posts without necessary research and his perceptions become the “truth”. This is dangerous when he has such a public profile and such influence. For evidence of influence visit the graph here and notice the incredible spike in traffic resulting from his post about the Monkeys at ChemZoo in April of this year. It is unlikely those visitors ever returned to our site or blog to hear our comments. Potential damage was done.This blog post is in regards to his most recent judgments of ChemSpider.

When ChemSpider was set up for the benefit of the chemistry community I had assumed that this humble effort by a small group of dedicated individuals would be welcomed by PMR and other Open Access advocates. In general I believe that’s true. Our actions, policies and status have drawn a significant amount of feedback from PMR on his blog. New feedback was posted late last week and I’ll get to that shortly. As a review, in keeping with the trend being set by Rich Apodaca (1,2,3), I am listing what’s happened to date.

“Constructive Feedback” for Newbies

The Challenge to ChemSpider Chemistry

When Sodium chloride dimers are bad science..but are on NIST Webbook and PubChem

Calcium Carbonate is not soluble and can’t have a logP PLUS Lipinski says Calcium Carbonate CAN have a logP

Prussian Blue on ChemSpider is Terrible…but still as good as Pubchem and Emolecules.

Is Stereochemistry on Taxol important? Should the public data be curated?

ChemSpider VERSUS PubChem or ChemSpider SUPPORTS PubChem

ChemSpider ripped off PubChem…damn them.

ChemSpider and Their Openness and non-Web 2.0

ChemSpider don’t understand what Web 2.0 is.

ChemSpider contribute to the community…and support PubChem

Spectral Data are Declared Open Data

Helping out the community with Web Services

There are a lot more…and so to the latest. I’ll identify the recent post comments in italics.

PMR> Recently the Chemspider company has announced an “Open Chemistry Web” which in my opinion misuses the word “Open”.

Open Chemistry Web is the name of a new blog set up and hosted by Will Griffiths. It’s not ODOSOS. It’s a NAME of a blog. If we are in an environment where the name of a blog cannot include the word “Open” then we are living in sad times. Will’s passion is in text-mining OPEN ACCESS Chemistry Articles..or others if people will allow it. Can he not name his own blog? Hmmm….

PMR> and its associates are commercial organization which have aggregated a large number of chemical connection tables and have started by calculating their properties and extracting literature references which they make freely accessible but not Open. The freedom is for an unspecified timescale and you cannot download significant amounts of the data and you cannot re-use it without permission. ”

Yes we are “commercial”. I dealt with this same comment previously. If you have interest in this please browse it. A later post outlines the present status of the project and whether or not it will survive.

Yes, we have aggregated a large number of connection tables and have started by calculating their properties and extracting literature references which they make freely accessible.We have done a lot more. We have made multiple services available to the community (1,2,3,4) but, with no surprise, have received no acknowledgment.

Regarding “not open“. We are giving away the ChemSpider database to those who ask for it. It will be published in PubChem. We USE Open Source components (1,2,3,4). We have not generated any Open Source components yet and our source code is not Open. We index Open Access articles on ChemRefer. We work with the Open Source data community to help.

Regarding “you cannot download significant amounts of the data and you cannot re-use it without permission“. We are giving away the ChemSpider database to those who ask for it. We do NOT have a server farm to support downloads. The FAQ page says

May I download the data and use it in my own database(s)?

You have limited rights in this regard. You can only assemble a database of 5000 structures or less, and their associated properties, from our database without our permission. You can download up to 1000 structures per day from the website. Please contact us at feedbackATchemspiderDOTcom to request an extension outside this constraint. We are willing to provide the ENTIRE database of ChemSpider structures at your request – the file will consist of InChI Strings, InChIKeys and ChemSpider IDs. These constraints are under regular review so please feel free to engage us in conversation.”

PMR>”Initially I was concerned about the complete lack of quality in these calculations and said so – I believe there has been some improvement in quality but I do not check and do not intend to do so. I do not follow Chemspider regularly but they appear to have added the ability for anyone to add annotations and curation. I have serious concerns about the lack of thought given to metadata and I do not expect Chemspider to be able to scale or to compete against modern approaches.”

I acknowledge the judgments and opinions. A question…in terms of online data sources for chemistry I believe that approximately 20 million structures ranks in the top 3. We have about 1500 chemists per day using the site with thousands of transactions including text and structre/substructure searching. Please compare with other services in this domain and, if you do this, provide quantitative information. We welcome any feedback on metadata. We are presently working on RDF’ing ChemSpider thanks to the guidance and support of Egon Willighagen. I have dealt with the metadata discussion previously here and abstracted below.

“Other comments include “I see very little difference between Chemfinder and Chemspider. They are both closed, proprietary, do not expose data, or metadata, or algorithms; have closed code, do not allow downloads or re-use. They lose metadata in their aggregation process. I have nothing personal against Chemspider (or, if they are associated, ACDLabs) – I just think the Web 1.0 model is out of date for chemistry.”

To respond…yes, the code is proprietary and closed..we don’t know of any Open Source code that would quickly search >10 million structures by structure and substructure (that will be covered in a separate blog as I have the utmost respect for the commercial entities that do this well! It’s DIFFICULT.) Oh…but Open Source isn’t part of the Web 2.0 definition. We don’t expose algorithms…correct…many are provided by collaborators and we do not have the right to expose their code. But that isn’t part of Web 2.0 either.

And next…the beloved “metadata” term. What exactly IS metadata? Let’s refer again to our web-friendly Wikipedia regarding metadata. In brief it’s “data about data” and a perfect example is an XML schema vs XML. An XML schema is metadata. According to my interpretation this means InChI and SMILES are not metadata since these data can be interchanged with the structure itself. I may be wrong. The hypothetical entity describing what data can be bound to a structure would be metadata not necessarily data related somehow to the structure, but rather more general data describing the datamodel – for example the source of the data – this IS metadata. ChemSpider doesn’t lose the metadata…we retain the only metadata currently available, the data source, and use it as our link out to the provider. Our primary role again, for now, is to connect silos of information via chemical structures.”

PMR> Chemspider also encourages Uploading Spectra Onto ChemSpider. These spectra by default all belong to Chemspider. They are not Open. If you can convince the world at large to donate IPR to you for free, you deserve some form of congratulations for sheer bravado. Note that even if you upload data and metadata you are not allowed to download it (there is a limit of 100 structures).

Thanks, again, for the judgments. We have been testing out the system with two of our advisory group and myself. Only JC Bradley’s Lab and Bob Lancashire have deposited and with the understanding, I believe, that the data would be “Open”. Since PMR’s blog posts continue to do damage to our reputation we have no choice but to respond. We do this with coding. Within 24 hours of his comments Open Data was declared, spectra can be downloaded. The intention was always there to do this…just we have higher priorities.

PMR>”We have ca. 250,000 calculations on molecules and 130,000 crystal structures which Chemspider have suggested we upload to them. I’m not yet sure why we should do this.”

Well, if they are Open Data, as marked at the CrystalEye website, and seeing as though people would like to access the data via ChemSpider, we should just be able to download. But, we don’t want all the data..we just want the structures and the appropriate URL structure to link back to CrystalEye. This is what we do with all data sources including NMRShiftDB.
PMR>”Chemrefer appears to allow searching of Open chemistry articles by keyword. Unexceptional, but why shouldn’t we simply use Pubchem? AFAIK it will index all these journals.”

PubChem indexes these journals? No, I think it’s PubMed. We’ll check on whether everything ChemRefer indexes is in PubMed. However, what they don’t do, yet, or ever, is connect the chemical names in those journals to chemical structures. That’s what’s been done for patents.

“PMR> The IPR model of Chemspider seems clear. No data, metadata and author contributions are Open.


“PMR>That allows them, at some stage in the future to close some or all of the site and to charge for data and services”

The site, as it exists today, is intended to stay free for all. We may, OPENLY acknowledged, open services that are for charge.

“PMR> and – like eMolecules and their tie-up with Wiley (Wiley and eMolecules: unacceptable; an explanation would be welcome) – I predict this will happen within 5 years (unless Chemspider fails to survive in its current form).

I have posted on what I believe is an inappropriate judgment by Peter that the data on Chemgate is extracted from the journals. I put a trackback to Peter’s original post. He never responded. He did comment separately though about busyness and commenting. Unfortunately Wiley and Chemgate now show up again…with no effort to clean up the previous comments and, unfortunately, more incorrect information about ChemSpider.

“PMR> So all the authors who are contributing metadata are, in effect, donating IP to Chemspider. I have no moral objection to this – it just seems retrograde when we have Open collections of molecules such as PubChem and our own crystalEye.”

ChemSpider data will all go onto PubChem shortly. This was announced at the recent PubChem meeting. I have asked PMR to point me to where I can download the CrystalEye collection if it is indeed Open Data.

“PMR>But a number of my friends in the Open Chemistry area are on the Chemspider advisory board, so I must be missing something. Perhaps they can show how donating IP to a commercial closed company advances the cause of Open Chemistry.”

I hope they discuss with you. This group is a powerful team of intellect, capabilities, insight and support. I value the opportunity to work with them.

“PMR> And I applaud Chemspiderman’s efforts to clean up chemistry. Sometimes this gets muddled with the association with a commercial organisation based on possessing chemical IP so sometimes my messages have been less than generous and I apologized.”

Yes, you did. And I accept it willingly. It was very gentleman like.

“PMR> I am not anti-capitalist – I do not attack companies per se. But I do attack people who use the word “Open” incorrectly and to promote themselves. I have done this when publishers come up with “Open Access” offerings which appear to be less than satisfactory ( see “open access products” at Nature obscures the debate, Why Open Access metrics are necessary) and for which the community has to pay. “Open” is now used by commercial organisations in the same way as “healthy” – please feel good about us and our activities as we use the word “Open”. We know it’s meaningless, but it makes us look good. Well, it isn’t meaningless. A number of people are trying carefully to describe what is meant by Open access, Open Data, Open source and Open Services. And when others use it to mean something less, I take exception. If nothing else it makes our job much harder.”

I will comment on this in a couple of later posts. I do not support the “marketing” use of Open and do not believe we are doing so. However, I want to comment more on this, but at a later date. Marketing statements bug me too. You’d think that “…the world’s most comprehensive openly accessible search engine for chemical structures” would be PubChem. But it’s not according to this marketing statement …who is it?

There have been comments about PubChem being the model of Openness. I think the effort is great. FULLY support it. But let’s wake up. If funding ceases then PubChem could go away. The data is Open. The software is NOT. PubChem is built around some home-built services and on top of commercial modules such as CACTVS and OpenEye. I discussed it here and it has not been challenged. Am I wrong?

“PMR>: There is nothing Open about this. Even the blog is not Open (it does not carry a CC licence). The services may be free, and they may be useful, but they are not Open. The text that they index may indeed be Open Access in its own right (and probably is because otherwise the publishers will sue them) but this is no especial credit to Chemrefer. We also index Open resources but we make our results Open.Chemrefer could disappear tomorrow. Only if the data, and the source code are made Openly available under licence can they be called Open.”

There is a CC license on the page. Peter acknowledged this. Who said the services were Open? if we did, point me to it and we will rectify. I have asked Peter separately whether all articles linked to CrystalEye are Open Access or some with permission from the publishers. This is very interesting.

This has been a long post. I understand I have likely added fuel to the fire. I have done it in a public way. I judge that ChemSpider is being harmed by the ongoing misinformation. I wish it to stop. What I want is advice and support to make this a better service for our users. However, I refuse to make it my personal mission to satiate PMR’s requests and objectives. ChemSpider is developed for its users and the community in general NOT for it’s non-users. PMR is not a user. Not everything has to be Open for it to be of high-value. I believe we deliver value.


About tony

Antony (Tony) J. Williams received his BSc in 1985 from the University of Liverpool (UK) and PhD in 1988 from the University of London (UK). His PhD research interests were in studying the effects of high pressure on molecular motions within lubricant related systems using Nuclear Magnetic Resonance. He moved to Ottawa, Canada to work for the National Research Council performing fundamental research on the electron paramagnetic resonance of radicals trapped in single crystals. Following his postdoctoral position he became the NMR Facility Manager for Ottawa University. Tony joined the Eastman Kodak Company in Rochester, New York as their NMR Technology Leader. He led the laboratory to develop quality control across multiple spectroscopy labs and helped establish walk-up laboratories providing NMR, LC-MS and other forms of spectroscopy to hundreds of chemists across multiple sites. This included the delivery of spectroscopic data to the desktop, automated processing and his initial interests in computer-assisted structure elucidation (CASE) systems. He also worked with a team to develop the worlds’ first web-based LIMS system, WIMS, capable of allowing chemical structure searching and spectral display. With his developing cheminformatic skills and passion for data management he left corporate America to join a small start-up company working out of Toronto, Canada. He joined ACD/Labs as their NMR Product Manager and various roles, including Chief Science Officer, during his 10 years with the company. His responsibilities included managing over 50 products at one time prior to developing a product management team, managing sales, marketing, technical support and technical services. ACD/Labs was one of Canada’s Fast 50 Tech Companies, and Forbes Fast 500 companies in 2001. His primary passions during his tenure with ACD/Labs was the continued adoption of web-based technologies and developing automated structure verification and elucidation platforms. While at ACD/Labs he suggested the possibility of developing a public resource for chemists attempting to integrate internet available chemical data. He finally pursued this vision with some close friends as a hobby project in the evenings and the result was the ChemSpider database ( Even while running out of a basement on hand built servers the website developed a large community following that eventually culminated in the acquisition of the website by the Royal Society of Chemistry (RSC) based in Cambridge, United Kingdom. Tony joined the organization, together with some of the other ChemSpider team, and became their Vice President of Strategic Development. At RSC he continued to develop cheminformatics tools, specifically ChemSpider, and was the technical lead for the chemistry aspects of the Open PHACTS project (, a project focused on the delivery of open data, open source and open systems to support the pharmaceutical sciences. He was also the technical lead for the UK National Chemical Database Service ( and the RSC lead for the PharmaSea project ( attempting to identify novel natural products from the ocean. He left RSC in 2015 to become a Computational Chemist in the National Center of Computational Toxicology at the Environmental Protection Agency where he is bringing his skills to bear working with a team on the delivery of a new software architecture for the management and delivery of data, algorithms and visualization tools. The “Chemistry Dashboard” was released on April 1st, no fooling, at, and provides access to over 700,000 chemicals, experimental and predicted properties and a developing link network to support the environmental sciences. Tony remains passionate about computer-assisted structure elucidation and verification approaches and continues to publish in this area. He is also passionate about teaching scientists to benefit from the developing array of social networking tools for scientists and is known as the ChemConnector on the networks. Over the years he has had adjunct roles at a number of institutions and presently enjoys working with scientists at both UNC Chapel Hill and NC State University. He is widely published with over 200 papers and book chapters and was the recipient of the Jim Gray Award for eScience in 2012. In 2016 he was awarded the North Carolina ACS Distinguished Speaker Award.

Posted by on October 15, 2007 in ChemSpider Chemistry, Community Building


Tags: , , , ,

4 Responses to Another Response to Constructive Feedback from Peter Murray-Rust…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.