InChI Strings and InChIKeys are very much the backbone of ChemSpider and have quickly become a way by which online databases are being connected online. The InChIKey is a hash of the InChiString and when the hash was adopted it was suggested that the likelihood that there would be a collision was very small, the estimate being, as quoted from the official InChI site:
“An example of InChI with its InChKey equivalent is shown below. There is a finite, but very small probability of finding two structures with the same InChIKey. For duplication of only the first block of 14 characters this is 1.3% in 109, equivalent to a single collision in one of 75 databases of 109 compounds each.”
At a previous ACS Meeting Prof Jonathan Goodman from University of Cambridge announced that he had identified a collision. The collision was for two isomers of spongistatin, a rather complex chemical structure with many stereocenters.
Jonathan has “done it again”…what a troublemaker he is (in a supremely gentlemanly way!). I was fortunate enough to receive the news about this collision from him just as I was getting on the flight from ACS Denver to home tonight and asked his permission to blog it as it is both exciting and, I believe, quite surprising news. Why? In this case the collision is for two distinctly different chemicals with totally different formulae and with NO stereochemistry! Very surprising!
As you can see in the figure below the two chemical compounds are simply long branched alkyl chains, one an alcohol and one a ketone.
In case Jonathan’s software tool that he was using to connect to the InChI generation software was doing something untoward with the molfile I confirmed the observation myself by drawing the structures in ACD/ChemSketch and generating the InChIKeys there. And, sure enough…I see exactly the same Standard InChIKeys for both molecules as shown in the movie below. VERY interesting!