A number of comments have been made regarding the appropriateness of the prediction of physicochemical parameters for the structures in ChemSpider. This culminated with me recently suggesting that certain types of compounds in the database should not have these certain parameters calculated. Specifically the suggestions were to:
Filter the ChemSpider database and remove the following PhysChem predictions (ACD/LogP, ACD/LogD (pH 5.5), ACD/LogD (pH 7.4), Number of Rule of 5 Violations, Number of H bond acceptors, Number of H bond donors, Number of Freely Rotating Bonds, Polar Surface Area) for substances with the following properties:
â€¢ Exclude multi-component substances
â€¢ Exclude substances represented as a single atom
â€¢ Exclude radicals
â€¢ Exclude structures with a delocalized charge
â€¢ Exclude structures containing isotopes
â€¢ Exclude substances containing elements not supported by the prediction algorithms (self-excluding really!)
PMR had previously added comments to his post regarding my questions. Based on this feedback and other comments on blog postings and email exchanges it’s time to summarize our path forward and the reasons for our decisions.
Regarding isotopes. I see no reason to exclude isotopes at this point. True, D2O does have a different boiling point that H2O, by 1.4 degrees. However, we are considering the prediction of properties for what is now a database containing almost 13.4 million chemical structures. Some of the properties being predicted are not easy to measure reproducibly and with high accuracy (think LogP and LogD) while for others, for example boiling point, predictions within a few percent should suffice. I very much doubt that the labeling of a couple of isolated sites with either C13 or N15 for example would make any significant difference. Deuterium labeling might have a small effect on pKa for teh ionizable protons and thereby change the logD values. However I doubt that the difference could be measured experimentally in terms of logD values. These are predictions and such be treated as such. Details regarding prediction accuracy for some of the properties are given elsewhere (1,2).
What we will likely proceed with is to:
1)Exclude multi-component substances (but be very careful about waters of crystallization and counterions)
2)Exclude substances represented as a single atom
4)Exclude structures with a delocalized charge