I was approached today with a question regarding the contents of the ChemSpider database. I have commented previously about the fact that there are quality issues based on some of the depositions but that these are being cleaned up fairly quickly because of the efforts of our curation processes, both robotic and manual. The question was regarding the fact that there were two structures on ChemSpider with the registry number 34090-76-1. This is not uncommon. There are occasions when a registry number is appropriate for a particular salt form while the associated structure is the neutral compound. So, the registry number will be on the database for both the neutral compound and the salt. However, this situation was different…it was down to the position of the double bond. The person was out to confirm the position of that double bond. It was not easy for me to confirm.
What was MORE confusing was what the person had already extracted information from an STN Registry Search. That search provided the following information:
CAS Name: 1,3-Isobenzofurandione, tetrahydro-5-methyl- (CA INDEX NAME)
Other listed names:
Cyclohexene-1,2-dicarboxylic anhydride, 4-methyl- (8CI)
4-Methyltetrahydrophthalic acid anhydride
And the following structure:
Compare this structure with the other two off of ChemSpider shown below in the array of three.
Every_single name from STN is listed as a “tetrahydro” compounds so, there needs to be a double bond in the molecule by default. If there isn’t then the compound is a “hexahydro” compound.
Obviously one of the alternative names for the compound was derived from phthalic acid anhydride and this suggests that the “missing double bond” should be at the ring junction as shown.
Included in the STN record is the tag “IDS” tag in the “CI” or “chemical Indexing” field. The term IDS stands for “Incompletely Defined Substance”. So, this is an example of a registry number being allocated to a compound that, in this case, is known to have an additional double bond but it is not shown on the chemical structure displayed in the STN search results but ICS declares it as being “incompletely defined”. Some might say that the fact that ChemSpider has two structures associated with the registration number but each with the double bond in a different position is appropriate. But likely those specific compounds have their OWN registry numbers. So, what should we do?
1) Remove the registry number 34090-76-1 associated with both structures?
2) Leave as is?
3) Add a new term ICS for such records and submit the new incompletely defined substance as a new form of structure?
4) Add NEW registry numbers associated with the individual structures (which someone will need to source since I don’t have them)
5) Something else?
I welcome any or all input. Based on input I will simply login to ChemSpider, make the edit and the information is changed (for addition or removal of identifiers). By working together like this there is an iterative improvement in the quality of structure-name pairs for the benefit of chemists, just as shown with the recent Wikipedia examination of Taxol.