Author Archives: admin

Community Views and Trust in Public Domain Chemistry Resources

Over the past 4 weeks I have been involved with some new and old friends in the world of chemistry to initiate an analysis of “quality” in public chemistry resources. This is work in progress and involves 3 separate groups (lets call them labs) looking at various resources. Here’s s short description of the project.

The questions we are attempting to answer are:

Core question : what is the quality of data online in public chemistry databases? How accurate and unambiguous is the representation of chemical structures and their measured properties in public chemistry databases?

How capable are the present cheminformatics tools of handling the complexities of structure representations – limited to “small” organic molecules
How hard is it to generate a reference set of highly curated, “gold standard” data (chemical structure and activity) for a database of “known drugs”?

We’ve started with the top 200 selling drugs. The three labs had to come to an agreement about which of the top 200 drugs were small molecules (many of the top 200 are monoclonal antibodies or polymers for example). We then had to decide would we deal with mixtures and combination drugs. Just to identify the list of NAMES of drugs we wanted to deal with was iterative and a negotiation.

Then we decided that each lab would work independently, each lab would have at least two members of the lab working on the same problem independently. We would have both intra-lab and inter-lab comparisons. We decided to start on a set of 10 drug names, using the GENERIC name as the name to work from. I started my part of the work just before I had to give a presentation at the EBI last week and was able to gather a lot of the data before the talk.

Starting with a chemical name how to you determine what the “correct” structure for that drug is. Think it’s easy? Try it! Where would you look? How would you confirm? What would the iterative loop look like in order for YOU to assert the chemical structure(s) for the drug “Vytorin”.

For me the process looks something like this. Use a level of “trust and experience” with previously used resources as a starting point and declare “This is the structure of X based on searching on the drug name for X”. Now, cross-reference, iterate, reiterate, find consistencies and collisions in order to come up with a final assertion, a list of consistent structures and the associated sources, and a set of other resources with inconsistent structures and a list of why they differ. Where possible, and if necessary, make edits to change the information (e.g. ChemSpider and Wikipedia). You can see an example of this for Vitamin K in the talk I gave at EBI here. For ten structures I came up with a number of observations for a number of drugs. The screenshot below summarizes some of the results (Click on the image to see the detail).

Represented in the table is the following information, true at the time of the search and may be already out of date

1) A search for thalidomide in ChEBI gives no hits

2) The structures of Zocor and Crestor on Drugbank are incorrect

3) There are no hits for Voglibose and Crestor on Common Chemistry

4) There were 3 incorrect structures on ChemSpider (now edited of course)

5) For most searches on a drug name on PubChem there are MULTIPLE hits and, for the set examined, the correct structure is in the results set. For example, there are 44 structures of Taxol retrieved with the search and the one I assert to be correct is there.

6) There were two incorrect structures on Wikipedia and one drug without an associated structure.

When I started the work I had a “trust” level for a number of the databases. My basic position at that time was as follows. I could rarely find the correct structure for a drug based on a text-based search of PubChem. I would generally find a set of hits and it was a lot of work to determine what was correct. Common Chemistry was excellent…but limited. Dailymed was generally good but structure representations could be abysmal.  ChEBI, DrugBank, ChemIDPlus and Wikipedia were generally VERY good. Of all of the sources I used, despite the rich data on PubChem, I struggled most with this resource to find the correct structure. The results started to show that my trust perceptions were being challenged.

In parallel with the work to prepare this small dataset for the presentation at EBI I decided that it was appropriate to ask the community for their views on some of the databases I was looking at in this work…specifically asking for how much they “trusted” a resource. Trust means different things to different people. The word, and the question I was asking in the survey, would be interpreted in different ways. But that’s the way we work…so why fight it? The survey is online here…and if you haven’t filled it in PLEASE DO!

The answers to date, from the 46 responders, are below (Click on the image to see the detail):

There are some very interesting results in here…and, I willingly admit, some I find VERY surprising. 1 person has no experience with Wikipedia? Wow. The majority of people have not heard of PDSP, ChemIDPLus, DailyMed or DrugBank…without knowing who the people are that are providing feedback of course I should not be shocked. Most of these resources are not for chemists per se but for Life Scientists. The number of votes for “Always Trust” for ChemSpider and PubChem are very high, and one might say, are a compliment. The results are clearly ChemSpider-biased since I asked the question to my social network. The difference between the people who Always Trust PubChem and Commonly Trust PubChem is one person only. This is wildly different from my own views. I have heard people say that PubChem is the equivalent of quality to CAS except it’s free. Sorry folks…afraid not! (I have since heard at the EBI meeting from one of the people from PubChem that it is possible to do searches in certain ways to limit hits. It should be noted that this does not guarantee that the correct structure is retrieved.) On the flip side to this the distribution of people rarely trusting PubChem is also quite high so someone has had some interesting experiences!

There are a small number of people who NEVER Trust the resources, and early on one person declared they trusted none of them. I trusted myself to tell a colleague…that will be “Egon Willighagen” and this was later confirmed in his blog “Trust has No Place in Science“. That may be true, and the topic of a separate post, but my judgment is pretty good!

How would I fill in the questionnaire. I would NEVER flag “Always Trust” for any of the databases. I would be able to rank order the databases in terms of my perceptions/experience and extracted trust for the quality of results I would find. The answers WOULD be different before I had conducted the work on the first 10 drugs compared with now, after the pre-work.

As the host of ChemSpider I would prefer that no one “Always Trusts” the resource as that will stop people from taking care with the data and thereby removing the possible value of them curating the data. However I am more than happy to have it Commonly Trusted and we have been working hard to gain the community’s trust in this area.

This work has triggered a number of responses….I’ll make my own comments on their positions separately… but their opinions are worth reading:

Egon Willighagen: Truth has no place in Science

Egon Willighagen: Truth has no place in Science Part 2

Christina Pikas: The role of trust in science Christina has a comment “I think that Anthony (sp.) could have chosen a better word than trust in his survey. “which of these have you evaluated and decided you could use? which of these would you prefer to use based on your evaluations of their merit?” Christina is right..I could have chosen a different word but I judge (chosen carefully!) that the responses would not have differed much.

There is also a healthy exchange happening on Friendfeed.

This work has only just started. An examination of >150 “small molecule drugs” by three labs is going to provide a lot of data. The work isn’t over and we have much to do. We’re learning a lot in the process about assertional loops, iterative process, collaboration and agreement. It’s a great adventure.


Posted by on December 11, 2010 in Community Building, Data Quality


Tags: , , , , , ,

Presentation at European Bioinformatics Institute

Last week was quite the trip to the United Kingdom…hit by the flu that put me into bed without a voice for an entire day and then gave the rescheduled talk the next day feeling a little beaten up. The talk discussed the recently conducted survey of public domain databases that I initiated last week (results embedded in the talk) as well as some of the observations comparing data for 10 drugs across a series of Public Domain databases. The meeting was a good chance to meet some of the hosts of some of the databases including PubChem, DrugBank, ChEBI/ChEMBL and SureChem. I’m sorry I missed the first day…


Tags: , , , ,

Please Participate in a Poll about Public Chemistry Databases

I will be giving a presentation next week and have been working on some validation of online public domain chemistry databases. In doing this work I realized that what would be of benefit would be to hear from the community what databases you feel can be trusted and to what level. Please visit the online survey and provide me your feedback. This would be very useful for my presentation. If you could do this in the next 48 hours I would be very grateful. Thanks!

Click here to take my survey!!!

1 Comment

Posted by on November 27, 2010 in Community Building


MedChemComm Cover And GSK Malaria Article

Recently Sean Ekins and I submitted an article to RSC’s MedChemComm. Recently published and FREE TO ACCESS the title of the article is “Meta-analysis of molecular property patterns and filtering of public datasets of antimalarial “hits” and drugs“.

The abstract for the article is “Neglected infectious diseases such as tuberculosis (TB) and malaria kill millions of people annually and the oral drugs used are subject to resistance requiring the urgent development of new therapeutics. Several groups, including pharmaceutical companies, have made large sets of antimalarial screening hit compounds and the associated bioassay data available for the community to learn from and potentially optimize. We have examined both intrinsic and predicted molecular properties across these datasets and compared them with large libraries of compounds screened against Mycobacterium tuberculosis in order to identify any obvious patterns, trends or relationships. One set of antimalarial hits provided by GlaxoSmithKline appears less optimal for lead optimization compared with two other sets of screening hits we examined. Active compounds against both diseases were identified to have larger molecular weight (~350-400) and logP values of ~4.0, values that are, in general, distinct from the less active compounds. The antimalarial hits were also filtered with computational rules to identify potentially undesirable substructures. We were surprised that approximately 75-85% of these compounds failed one of the sets of filters that we applied during this work. The level of filter failure was much higher than for FDA approved drugs or a subset of antimalarial drugs. Both antimalarial and antituberculosis drug discovery should likely use simple available approaches to ensure that the hits derived from large scale screening are worth optimizing and do not clearly represent reactive compounds with a higher probability of toxicity in vivo.”

My friend and co-author Sean put together a great image for the article and we were happy to see that it was taken as cover art for the issue. Nicely done Sean!

1 Comment

Posted by on November 26, 2010 in General Communications


Harry Potter Sings the Elements

Tom Lehrer‘s song “The Elements” is a favorite for chemists. It’s clever, entertaining and, well, purely chemical. The song has been used on Theo Gray’s iPad version of his book The Elements as shown below but the iPad version is way more than just the song and if you have an iPad and don’t have The Elements I recommend you get it!

But now a generation of children will get introduced to Tom Lehrer’s song because of Harry Potter, aka Daniel Radcliffe. His rendition is on YouTube. Young Radcliffe…very impressive!

Leave a comment

Posted by on November 13, 2010 in Humor


Tags: , , , ,

Elf Yourself in 2010

Elf Yourself has been around for a few years and every year it gets more complete. It’s already available for this year and, as always, lots of fun to play with. Our family tradition gets extended one more year …

Leave a comment

Posted by on November 10, 2010 in Humor



A YouTube Overview of Our Book: Collaborative Computational Technologies for Biomedical Research

This movie provides an overview of the book “Collaborative Computational Technologies for Biomedical Research” edited by Sean Ekins, Maggie Hupcey and Antony Williams and published by Wiley and Sons. All of the authors either have extensive backgrounds in computational software for biomedical research or have done wet lab research for drug discovery. Many have worked in software companies, pharmaceutical companies or consulting companies and have the appropriate skills to produce an excellent overview of present activities in the area of Collaborative Computational Technologies for Biomedical Research.

Leave a comment

Posted by on November 9, 2010 in Book Reviews, General Communications


Tags: , ,

Finding the Structure of Vitamin K1 Online

You would think that finding the correct structure of Vitamin K1 online in public domain resources would be an easy exercise. But not so fast. Using the assertion that the chemical structure is correct in the Merck Index, and then wandering through CAS’s Common Chemistry to validate this assumption, this short movie takes us through Wikipedia, Wolfram Alpha, KEGG, DrugBank, PubChem and other online resources to show how complex and impure the public domain databases are in terms of resourcing good quality name-structure associations for chemicals. Vitamin K1 is actually a rather simple chemical structure. Finding the correct chemical structure online…not so simple.


Tags: , , , , , , , ,

Lab on a Chip Article features in the Top Ten AGAIN

Articles can take a long time to write. I still try to keep my publishing record up and collaborate with some terrific scientists on an ongoing basis: Sean Ekins, Kirill Blinov, Gary Martin and Mikhail Elyashberg especially.

It’s always nice to get the recognition from the readers! RSC sends out emails like this when you are listed in the Top 10. We are in the Top 10 again this month for our Lab on a Chip article. Nice!

Dear Dr Williams,

Precompetitive preclinical ADME/Tox data: set it free on the web to facilitate computational model building and assist drug development

We are delighted to tell you that your article has been highly accessed again this month. It features in the list of top ten most accessed Lab on a Chip articles on the web. You can browse the full list of top 10 articles here.

Many thanks for choosing to publish this work in Lab on a Chip.
We look forward to receiving your next submission soon.

With best wishes
Harp Minhas
Editor, Lab on a Chip

Leave a comment

Posted by on November 4, 2010 in General Communications


Tags: , ,

Guard the Prince and No One Leave the Room

Over the years I have been involved with various leadership courses participating as both a student and as a leader. I’ve helped lead courses for personal growth around the country (USA) as part of the Mankind Project as well as part of the Leadership Challenge when working in corporate America. I’m involved in a number of personal projects collaborating with some great people. We are making great progress. One of the greatest things to learn when efficiently trying to get through a collaborative project is that progress often comes through doing what is asked. That takes listening, agreement and then EXECUTION. When I think of listening, agreement and execution I always go back to this famous scene from Monty Python….loop the “guard the prince” dialog….

Leave a comment

Posted by on November 3, 2010 in Humor