RSS

Continuing Review of the NPC Browser Content – Most Cleanup is the Responsibility of the Hosts

30 Jul

In the past two weeks I have been in a number of discussions regarding my blog posts about the NPC Browser. My last blog post brought a comment from Ajit Jadhav, one of the authors of the original Science Translational Medicines publication about the NPC Browser. Ajit commended Sean and I on our light-hearted approach to discussing the issues of quality.

Specifically he picked me up on the fact that American Cockroach IS listed on Dailymed as a medication. VERY interesting!  He commented

“Tony, Thanks for the amusing post. See here for more details of one example, american cockroach, which is Antigen Laboratories’ allergenic extract: http://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?id=12809.

And we can go on. But I would rather keep moving in a forward direction in life.

Regarding NPC… in case if it’s not clear yet, the collection is a small subset of HTS amenable compounds. The other content in the NPC Browser is supplementary.

Regarding you and Sean Ekins, you guys should go on the road as a comedic duo act. After all the serious scientific talks, the two of you can be the entertainment. One can be called Spinning and the other can be called Wheel.

I will volunteer to do the drum rolls for you :)

Gents, have fun working. Or… spinning if you enjoy that more. Apparently, the NPC Browser has hit a nerve in each of you. So I will check back on the blog to see what other entertainment you’re dishing out.  The more outrageous, the better! It just reveals more about you than the NPC Browser :)

Ajit”

My response is here and I insert a slice below.

“NPC was not ORIGINALLY described as a small set of HTS amenable compounds according to the Science Trans Med paper that describes it. According to the paper, and I quote “…the NCGC Pharmaceutical Collection (NPC) – a definitive collection of drugs registered or approved for use use in humans or animals.” It also states that the “NPC is the most comprehensive and accurate exposition to date of MEs registered or approved for human or vetinary use worldwide.” Having reviewed a subset of structures related to a particular class of compounds, over 140 entities, with a >70% failure in “accuracy”, I have to question this statement. I judge that the Merck Index (book form or electronic form) is a better collection. In case you are not aware of this resource details are:

http://www.merckbooks.com/mindex/referenceset.html

As I blogged in my post “Rabbits, Potatoes and other Vegetables in the NCGC Database” there are some interesting things in the database. Responding to a comment on that post I commented on other things listed in the database.

WATERCRESS
WATERMELON
WHEAT
WHEAT BRAN
WHEAT ENDOSPREM
WHEAT GERM
WHEAT GLUTEN
WHEAT GLUTEN
WHEAT MIDDLINGS
WHEAT MIDDLINGS
WHEAT MIDDLINGS
WHEY
WHITE FISH
WHITE MUSTARD
WHITE OAK BARK
WHITE PEPPER
WHITE WILLOW EXTRACT
WILD ROSE EXTRACT
WINE

I’ve searched these in DailyMed …not much luck I’m afraid 🙁

I DO believe that list below would give me hits in Dailymed but these members of the NCGC pharmaceutical collection are likely just a little generic!

List of "generics" in the NCGC pharmaceutical collection

It’s likely that most all Dailymed labels contain “ingredients, water and additives”. I wonder how many of them contain “self heal” though.

As defined in the original paper ““…the NCGC Pharmaceutical Collection (NPC) – a definitive collection of drugs registered or approved for use use in humans or animals.” Also “NPC is the most comprehensive and accurate exposition to date of MEs registered or approved for human or vetinary use worldwide.” I challenge that based on the observations above.

I have to argue that it is time to do some very basic browsing of the entries in the database that are simply text entries with no structures. There are MANY that are distinct chemicals for which the chemical can easily be located. There are also many common terms that should simply be deleted out of the dataset. Hundreds in fact. I judge that one good evening of work would catch many of the most obvious terms that are in error. I doubt that a crowdsourcing approach will address this and this very basic clean up is the responsibility of the database hosts. It’s certainly a reputation issue. Ajit commented “The other content in the NPC Browser is supplementary”. I am trying to understand how? It doesn’t align with my interpretation of the paper or that of many of the people who have been discussing the data set with me in recent weeks.

 

 

About tony

Founder of ChemZoo Inc., the host of ChemSpider (www.chemspider.com). ChemSpider is an open access online database of chemical structures and property transaction based services to enable chemists around the world to data mine chemistry databases. The Royal Society of Chemistry acquired ChemSpider in May 2009. Presently working as a consortium member of the OpenPHACTS IMI project (http://www.openphacts.org/). This focuses on how drug discovery can utilize semantic technologies to improve decision making and brings together 22 European team members to develop an infrastructure to link together public and private data for the drug discovery community. I am also involved with the PharmaSea FP7 project (http://www.pharma-sea.eu/) trying to identify new classes of marine natural products with potential pharmacological activity. I am also one of the hosts for three wikis for Science: ScientistsDB, SciMobileApps and SciDBs. Over the past decade I held many responsibilities including the direction of the development of scientific software applications for spectroscopy and general chemistry, directing marketing efforts, sales and business development collaborations for the company. Eight years experience of analytical laboratory leadership and management. Experienced in experimental techniques, implementation of new NMR technologies, walk-up facility management, research and development, manufacturing support and teaching. Ability to provide situation analysis, creative solutions and establish good working relationships. Prolific author with over a 150 peer-reviewed scientific publications, 3 patents and over 300 public presentations. Specialties Leadership in the domain of free access Chemistry, Product and project management, Organizational and Leadership development, Competitive analysis and Business Development, Entrepreneurial.

2 Responses to Continuing Review of the NPC Browser Content – Most Cleanup is the Responsibility of the Hosts

  1. trung

    August 1, 2011 at 11:49 pm

    Hi Tony, it’s tough to keep up with you. I don’t have enough sleep as it is 8-). My colleague Noel Southall, who is a walking pharmacopeia, should be able to tell you exactly why these things showed up, but he’s on vacation at the moment, so I’ll do my best to emulate him. Most of the entries in the first list (e.g., watercress, watermelon, etc.) can be found in the FDA’s NDC directory, which means that at some point they were actual marketed products. You can search for them here: http://www.accessdata.fda.gov/scripts/cder/ndc/activeingredient.cfm
    The “Self heal” entry took a bit longer to track down. It came from Health Canada. If you search for the drug identification number (DIN) 01911171 at this page http://webprod3.hc-sc.gc.ca/dpd-bdpp/start-debuter.do?lang=eng, you’ll see “SELF HEAL” is one of the active ingredients listed with the strength of “55 MG / 100 ML”. When time permits, we hope to eventually incorporate the original source for every piece of data.
    Trung

     
  2. tony

    August 2, 2011 at 12:03 pm

    Trung…I probably don’t gte enough sleep either….always bed after midnight (commonly 2am) and up by 6am. It helps get a lot done…

    I look forward to hearing back from Noel about all these weird and wacky “drugs”. Self-heal sounds very interesting! The Ingredients one is a classic 🙂 I was fascinated to see Water Cress listed. But also American Cockroach is a good one and is of course in DailyMed. Did you ever find Rabbit? See earlier post.

    I’m up for a phone conversation anytime regarding the continuing work I am doing. The latest post with the 25 common drugs should be useful to help curate the data. CHeers

     

Leave a Reply

Your email address will not be published. Required fields are marked *