My blog has been fairly inactive for the past few months, driven primarily by my move from working on cheminformatics at the Royal Society of Chemistry to working at the National Center for Computational Toxicology at the Environmental Protection Agency. While I stopped working on ChemSpider about 18 months before I left RSC (to focus on the developing RSC Data Repository) my interest and focus on data quality and a long-standing interest in “accuracy in chemical structure representations” has never dwindled. At the EPA-NCCT we are very focused on working to produce high quality chemical structure databases, following on from the work of my colleague Ann Richard who initiated work on DSSTox over a decade ago.
It was therefore with great interest that I became aware of the confusion in regards to the chemical structure of BIA-10-2474, a drug that has attracted a lot of interest because of a clinical trial with negative outcomes. I am entering the story late compared to my many time collaborators and friends Sean Ekins, Chris Southan and ALex Clark, but more about their work later. The news to date is best summarized at Derek’s In the Pipeline blog and on David Kroll’s post on Forbes.
Based on my previous history and work with helping to curate chemical structures on Wikipedia (starting one Christmas in 2008) my experience would be that Wikipedia is a GOOD PLACE to source high quality structures, especially after the work invested in curating chemical data over the years. The first structure for BIA-10-2474 that was reported on Wikipedia is shown below.
On January 16th Chris performed his usually thorough examination of structure integrity and links to public sources (he is a master in this domain!) but commented specifically ” The molecular identity of BIA-10-2474 can only be formally verified directly by BIAL or indirectly from regulatory documentation they may have submitted” as the chemical structure itself was inferred from the name.
Nevertheless my friends Sean Ekins and Alex Clark were already investigating what OPEN MODELS may be able to predict about the chemical: See here, here and here. You should be impressed regarding what is possible when running a molecular structure through several Bayesian models in Alex’s mobile app called PolyPharma!
By January 21st Chris was commenting that the structure had changed and highlighted the extract from what was exposed by Figaro and listing the chemical name: 3-(1-(cyclohexyl(methyl)carbamoyl)-1H-imidazol-4-yl)pyridine 1-oxide. Want to know what that name means as a structure? Take the name “3-(1-(cyclohexyl(methyl)carbamoyl)-1H-imidazol-4-yl)pyridine 1-oxide” and paste it into the free online service OPSIN. The results are shown below.
That structure has now found its way to Wikipedia (updated on the 21st January – check out the edits between the two forms of the article here).
Sean Ekins has maintained a running series of blog posts here. Using a stack of openly accessible algorithms and websites Sean has now produced a whole series of predictions for the “final molecule”. Chris Southan has also continued to expand his work and I direct you to his latest blogpost for more information. Nice stuff Chris.
It took days following the news starting to show up regarding the results of the drug trial before the chemical structure was actually identified (i.e. the structure was blinded). How much work, how much confusion was created by having the drug structures blind? We have to imagine that the authorities had faster access to the details!
It is understandable that companies keep their chemical structures hidden. Patents are intentionally obfuscating (with a compound going into a trial commonly hidden among hundreds if not tens of thousands of chemicals that could be enumerated from a Markush structure). Until then Chris Southan will continue to educate the world about how competitive intelligence investigations.