No wonder I’m so healthy…I like rabbits, eat lots of potatoes, drink “green sludge in a bottle” to get my vegetables and also have lots of other “ingredients” in my diet. According to the NCGC data collection that I have been browsing through the NPC Browser on my desktop these are all parts of the NCGC data collection. While chatting with a number of pharmaceutical scientists last week regarding data quality in public domain databases the NPC Browser was used as an example of data content “to review”. If you’d like to review the contents yourself you will find many issues regarding stereochemistry, valency, charge balance, incorrect associations between structures and identifiers but you can also review the data in table format and look at the content that doesn’t have structures associated and scratch your head at some of the content. To see the errors go to tabular mode as shown below and scroll down past record 8000.
Notice that the drugs “water Lily”, “Water Cress” and “Water Hemlock” are listed. I wonder what water cress is used for? It all becomes much more fun when you see some of the others listed below. Rabbit…now that’s a good…take two in the morning, without food, and repeat dosage for 7 days. Cures “big sharp pointy teeth”. Vegetables…ah yes, nothing specific. Just “vegetables”. Kind of a cure all really. Take 5 portions per day, with food, obviously. And potatoes…good for a stiff neck (starch collar syndrome). If you browse through you’ll also find “ingredients” listed as a compound. Glad about that really…most drugs have ingredients. I am sure there is a reason that these are listed, though I cannot imagine what the reason would be. If there is no good reason it is time to decommission this dataset until it is cleaned up in a major way. Clearly the contents are suspect at best.






#1 by Markus Sitzmann on July 19, 2011 - 3:37 pm
Did they maybe use FDA’s Unique Ingredient Identifier (UNII) list (http://www.cancer.gov/cancertopics/cancerlibrary/terminologyresources/fda) for the generation of the database? Sounds like.
Markus
#2 by tony on July 19, 2011 - 9:30 pm
Markus…that would actually make good sense based on what I am seeing in the list. For example,
WATERCRESS
WATERMELON
WHEAT
WHEAT BRAN
WHEAT ENDOSPREM
WHEAT GERM
WHEAT GLUTEN
WHEAT GLUTEN
WHEAT MIDDLINGS
WHEAT MIDDLINGS
WHEAT MIDDLINGS
WHEY
WHITE FISH
WHITE MUSTARD
WHITE OAK BARK
WHITE PEPPER
WHITE WILLOW EXTRACT
WILD ROSE EXTRACT
WINE
However, it cannot be all-encompassing as it doesn;t list Water Lily or “Ingredients” that I have seen listed. So, I think this is only one of the included sets influencing the data.
#3 by Sean Ekins on July 20, 2011 - 12:24 pm
..and what about Cockroach, American – when was that an ingredient in drugs – folk medicine or delicacy in countries outside the US.