Continuing Review of the NPC Browser Content – Most Cleanup is the Responsibility of the Hosts


In the past two weeks I have been in a number of discussions regarding my blog posts about the NPC Browser. My last blog post brought a comment from Ajit Jadhav, one of the authors of the original Science Translational Medicines publication about the NPC Browser. Ajit commended Sean and I on our light-hearted approach to discussing the issues of quality.

Specifically he picked me up on the fact that American Cockroach IS listed on Dailymed as a medication. VERY interesting!  He commented

“Tony, Thanks for the amusing post. See here for more details of one example, american cockroach, which is Antigen Laboratories’ allergenic extract: http://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?id=12809.

And we can go on. But I would rather keep moving in a forward direction in life.

Regarding NPC… in case if it’s not clear yet, the collection is a small subset of HTS amenable compounds. The other content in the NPC Browser is supplementary.

Regarding you and Sean Ekins, you guys should go on the road as a comedic duo act. After all the serious scientific talks, the two of you can be the entertainment. One can be called Spinning and the other can be called Wheel.

I will volunteer to do the drum rolls for you :)

Gents, have fun working. Or… spinning if you enjoy that more. Apparently, the NPC Browser has hit a nerve in each of you. So I will check back on the blog to see what other entertainment you’re dishing out.  The more outrageous, the better! It just reveals more about you than the NPC Browser :)

Ajit”

My response is here and I insert a slice below.

“NPC was not ORIGINALLY described as a small set of HTS amenable compounds according to the Science Trans Med paper that describes it. According to the paper, and I quote “…the NCGC Pharmaceutical Collection (NPC) – a definitive collection of drugs registered or approved for use use in humans or animals.” It also states that the “NPC is the most comprehensive and accurate exposition to date of MEs registered or approved for human or vetinary use worldwide.” Having reviewed a subset of structures related to a particular class of compounds, over 140 entities, with a >70% failure in “accuracy”, I have to question this statement. I judge that the Merck Index (book form or electronic form) is a better collection. In case you are not aware of this resource details are:

http://www.merckbooks.com/mindex/referenceset.html

As I blogged in my post “Rabbits, Potatoes and other Vegetables in the NCGC Database” there are some interesting things in the database. Responding to a comment on that post I commented on other things listed in the database.

WATERCRESS
WATERMELON
WHEAT
WHEAT BRAN
WHEAT ENDOSPREM
WHEAT GERM
WHEAT GLUTEN
WHEAT GLUTEN
WHEAT MIDDLINGS
WHEAT MIDDLINGS
WHEAT MIDDLINGS
WHEY
WHITE FISH
WHITE MUSTARD
WHITE OAK BARK
WHITE PEPPER
WHITE WILLOW EXTRACT
WILD ROSE EXTRACT
WINE

I’ve searched these in DailyMed …not much luck I’m afraid :-(

I DO believe that list below would give me hits in Dailymed but these members of the NCGC pharmaceutical collection are likely just a little generic!

List of "generics" in the NCGC pharmaceutical collection

It’s likely that most all Dailymed labels contain “ingredients, water and additives”. I wonder how many of them contain “self heal” though.

As defined in the original paper ““…the NCGC Pharmaceutical Collection (NPC) – a definitive collection of drugs registered or approved for use use in humans or animals.” Also “NPC is the most comprehensive and accurate exposition to date of MEs registered or approved for human or vetinary use worldwide.” I challenge that based on the observations above.

I have to argue that it is time to do some very basic browsing of the entries in the database that are simply text entries with no structures. There are MANY that are distinct chemicals for which the chemical can easily be located. There are also many common terms that should simply be deleted out of the dataset. Hundreds in fact. I judge that one good evening of work would catch many of the most obvious terms that are in error. I doubt that a crowdsourcing approach will address this and this very basic clean up is the responsibility of the database hosts. It’s certainly a reputation issue. Ajit commented “The other content in the NPC Browser is supplementary”. I am trying to understand how? It doesn’t align with my interpretation of the paper or that of many of the people who have been discussing the data set with me in recent weeks.

 

  1. #1 by trung on August 1, 2011 - 11:49 pm

    Hi Tony, it’s tough to keep up with you. I don’t have enough sleep as it is 8-). My colleague Noel Southall, who is a walking pharmacopeia, should be able to tell you exactly why these things showed up, but he’s on vacation at the moment, so I’ll do my best to emulate him. Most of the entries in the first list (e.g., watercress, watermelon, etc.) can be found in the FDA’s NDC directory, which means that at some point they were actual marketed products. You can search for them here: http://www.accessdata.fda.gov/scripts/cder/ndc/activeingredient.cfm
    The “Self heal” entry took a bit longer to track down. It came from Health Canada. If you search for the drug identification number (DIN) 01911171 at this page http://webprod3.hc-sc.gc.ca/dpd-bdpp/start-debuter.do?lang=eng, you’ll see “SELF HEAL” is one of the active ingredients listed with the strength of “55 MG / 100 ML”. When time permits, we hope to eventually incorporate the original source for every piece of data.
    Trung

  2. #2 by tony on August 2, 2011 - 12:03 pm

    Trung…I probably don’t gte enough sleep either….always bed after midnight (commonly 2am) and up by 6am. It helps get a lot done…

    I look forward to hearing back from Noel about all these weird and wacky “drugs”. Self-heal sounds very interesting! The Ingredients one is a classic :-) I was fascinated to see Water Cress listed. But also American Cockroach is a good one and is of course in DailyMed. Did you ever find Rabbit? See earlier post.

    I’m up for a phone conversation anytime regarding the continuing work I am doing. The latest post with the 25 common drugs should be useful to help curate the data. CHeers

(will not be published)