My final presentation at ACS Denver yesterday I think was the clearest presentation I gave all week. As with most presentations I gave last week I was up at 4am to finish it off based on conversations I had been having during the week. A lot of people came to the booth after the presentation to acknowledge that they had been dealing with such challenges for years and that it was time that a drug collection was finally available. It took months to get 152 drugs “right”. It would take a looong time to reproduce something of the quality of Merck Index!
“Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs
Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins, assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.”