Posts Tagged ChEBI
I had the pleasure of co-presenting with my friend Jean-Claude Bradley today at the “3rd Annual Drug Discovery Partnership: Filling the Pipeline“. Jean-Claude gave a great talk, available on Slideshare here, and discussed the issue of data quality, how improve data gives improved models, the cross-validation of data and proliferation of errors. My talk is on Slideshare here and embedded below. In many ways I discussed similar issues, though not focused on melting point data but rather on structures, structure-identifier relationships, the cross-linking of multiple resources on the internet and how online resources can support Open Drug Discovery Systems. In this presentation I discussed some of the work we are doing on Open PHACTS.
Last week was quite the trip to the United Kingdom…hit by the flu that put me into bed without a voice for an entire day and then gave the rescheduled talk the next day feeling a little beaten up. The talk discussed the recently conducted survey of public domain databases that I initiated last week (results embedded in the talk) as well as some of the observations comparing data for 10 drugs across a series of Public Domain databases. The meeting was a good chance to meet some of the hosts of some of the databases including PubChem, DrugBank, ChEBI/ChEMBL and SureChem. I’m sorry I missed the first day…
I have been looking at the state of curated data on the internet and blogged last night about the messy world of curated data. I should emphasize…none of these commentaries are meant to be harsh. Believe me, I’ve gone through the process of validating data and it’s difficult. There will be mistakes but what we need are processes and systems to clean these data up efficiently. If I see an error I want to annotate it and let people know there is an error. With todays’s technologies it is not difficult.
Let’s take another example from DrugBank
That listed chemical name above the structure doesn’t look very consistent…I don’t see any stereochemistry, certainly no “dihydroxy” and overall…yes, it’s definitely wrong. The actual structure for that name is shown below. Looks like an entire half of the molecule is missing. The InChI and InChIKey are for the molecule shown in DrugBank but the link to KEGG is to the molecule shown below…here.
The links on DrugBank to PubCHem and ChEBI are to the molecule to the left. All of the data in the DrugBank record in terms of outlinks are for the structure on the left EXCEPT the actual structure on the record, and its associated SMILEs and InChIs are for the “2-amino-3,5-dihydro-4H-pyrrolo[2,3-d]pyrimidin-4-one” moiety. Oops.
Recently I pointed out to David Wishart, host of DrugBank, some of the issues I had been seeing and it appears there will be a major update to DrugBank in the next few weeks that, in theory, will address some, and hopefully all of these observations.