Following on from the many comments made about the recent post about the NPC Browser Markus Sitzmann highlighted a “fun molecule” that he found on ChemSpider. It was here as ChemSpiderID 19053748 shown below but it has now been deprecated…I logged in and deprecated it .
Markus also commented on Sean Ekin’s blog here:
“Well, particularly ChemSpider belongs to the group of “polluters” in PubChem. Count the number of Aspirin, Benzene or Ethanol structures submitted by ChemSpider to PubChem (only linking to a “deprecated” ChemSpider record). Or make an advanced search for ChemSpider records containing also Argon, here is an example:
There are many other examples.”
Markus is CORRECT. I have commented on this publicly myself on a number of occasions and many people have noticed that there are data in PubChem that are in error and originally came from ChemSpider. There’s no point denying it as it’s there for all to see ! We have had the intention for a LONG time to deprecate this data from PubChem and replace it with an updated deposition of cleaner data. The intention remains but the challenge is finding the time to do it. We will do it.
Where did the data came from? These “argon” issues are really NOT argon issues…they are the results of molfiles finding their way into ChemSpider from “patent molecules” where the -Ar is expected to represent a Markush structure where Ar means “Aryl”. This is like -Alk meaning alkyl. Similar issues arise when molecules are drawn as -X, -Y and -Z and lists of X,Y,Z substitutions are give. For example X=CH3, C2H5, Y=F, Br and Z= Br, Cl. Unfortunately Y is not only a substitution it’s an element, Yttrium. So when a molecule is drawn with a supposed Markush bond to -Y then we have a REAL molecule with Yttrium attached. Agh.
A list of the examples of “interesting Ar molecules” are shown below.
At this point these have all been deprecated…takes about 30 seconds per molecule..but if they were in our original deposition to PubChem they are still there until we deprecate. Ahh…the ongoing joys of data curation.