Are students at risk using ChemSpider? It seems so based on recent commentary by Peter Murray-Rust. Peter has done us the service of driving ChemSpider from the point of view of someone interested in inorganic and organometallic complexes. The majority of users are performing either text based or structure/substructure searches based on organic molecules and their feedback is mostly congratulatory. It is excellent to receive feedback on that area of chemistry we suspect would be very challenging â€“ inorganics and organometallics. I believe that we all struggle with these types of compounds and have therefore compared with the two other databases of note with over 5 million compounds â€“ PubChem and eMolecules
Peter identified an issue with the display of sodium hydride. We have NOT manually examined 10.6 million records so were not aware of this bug. The Sodium Hydride record is now curated with Peterâ€™s comments and the display bug is now fixed. THIS is the power of community feedback. It will take some time to repopulate the images across 10 million records though. By comparison, a search of eMolecules produces no hits. A search of PubChem produces a number of hits, one containing a sodium ion and a hydride ion, bonded by a dative bond.
Peter also identified issues with Prussian Blue as excerpted below â€œâ€¦ the chemical formula has been represented as separated iron ions and cyanide ions.â€ The Prussian blue record is now also curated with Peterâ€™s comments. These complexes are challenging for all usâ€¦so warn your students! The record in question for ChemSpider is here, for PubChem are here and for Emolecules is here. Look at the display for PubChem 182606 as an example of the challenge.
Also, check eMolecules display. If you search eMolecules for Prussian Blue you will find 3 results. Check each of them. Here’s an example. Notice any issues?
The conversion of search structures via SDF files as well as the display of such compounds is challenging for all of us! The work has already been done this evening to deal with the dative bonds and coordination bonds in such complexes and these structures will be updated in the near future.
While searching millions of organic molecules is not easy the truth is it is more challenging for organometallics and we are conscious there would be issues here. I judge there to be two organizations with the ability to handle these complex molecules appropriately. One is CAS and the other is the Cambridge Crystallographic Database. Certainly it remains a challenge for us, as well as others. In theory this will be addressed well in CrystalEye and when these data are made available we will work with the group to determine a path to migrate such complex structures via SDF if possible. This will likely be done if they are to be deposited in PubChem. InChIs are not the solution since as identified at the InChiFAQ it does not support complex organometallics.
Are students at risk using ChemSpider? There have been recent reports about errors on Wikipedia and whether or not Wikipedia should be trusted. I know people working hard on populating Wikipedia and they are passionate individuals attempting to give back to the community. ChemSpider has already challenged the statement about Calcium Carbonate solubility on this blog but on Wikipedia it states it is insoluble but in the same page discusses the solubility of calcium carbonate (this might be because there is a Wikipedia accepted definition of insoluble). The ChemSpider team is also working hard and are passionate about what we are doing. What we need is continuing feedback. The best warning we can give at present is ChemSpider is beta. But, it is here to stay and we are working on all reported bugs in an appropriate order. As with all other large database resources students should take caution. We are all imperfect.
We are very grateful to Peter for his ongoing feedback regarding ChemSpider. So much so that we have voted Peter our â€œTester of the Monthâ€. The feedback is welcome. Weâ€™ve already fixed all the bugsâ€¦publishing the update to >10 millions structures will take time though.