RSS

Tag Archives: NCGC collection

Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs

My final presentation at ACS Denver yesterday I think was the clearest presentation I gave all week. As with most presentations I gave last week I was up at 4am to finish it off based on conversations I had been having during the week. A lot of people came to the booth after the presentation to acknowledge that they had been dealing with such challenges for years and that it was time that a drug collection was finally available. It took months to get 152 drugs “right”. It would take a looong time to reproduce something of the quality of Merck Index!

Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs

Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins,  assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.”

 

 

Tags: , , , , , , ,

Searching for “Complete Synonyms” in PubChem and the NPC Browser

I am interested in feedback from online databases as to expected behaviors from a search. PubChem has a Complete Synonym search that limits a chemical name based search to the synonym field. Without that fielded search the search is across all text in a record, I assume. The difference in the results is shown below. The top image shows a search for Taxol and returning 59 results.

A search for Taxol in PubChem

Below is a search on Taxol[completesynonym]. This search returns 5 hits for Taxol.

I wonder whether most users of PubChem know that they need to add the [completesynonym] definition to limit the search? You might want to try Diamond and Diamond[completesynonym] as searches and look at the results.

I am assuming that on the NPC Browser a similar type of search can be conducted to limit results as a search on the drug Lidocaine returns 14 chemicals..all of them different. If this search exists I have missed it. Can anyone comment?

With ChemSpider we do our utmost to return a single structure for a clearly unique name such as Taxol and Lidocaine. We believe that’s what most people would expect. Thoughts and comments welcome.

 

Tags: , ,

 
Stop SOPA