Eventually there will be simple answers to the question commonly asked by chemists. “What is the chemical structure of INSERT NAME?” This is going to be true for drugs as the various online databases work together to clean up, curate, qualify and declare what a chemical structure is for a particular drug. While we can have the purists argument about structure drawings not representing reality, for example that compounds are atoms bonded together by shared clouds of electrons that at any point in time may be changing, reorganizing, tautomerizing etc the reality is also that we need a common language for information exchange and in the world of visual depictions for chemistry the layout in a 2D structure diagram is it. As we come together as a community to agree on preferred ways to standardize chemicals to assist in representations in databases for example, this situation will improve. The efforts of the FDA to define structure representation standards, with the support of pharma, will contribute. For now we are left with the challenges of different representations in different databases as well as simply the quality of data being fed into these databases. These are some of the issues we are trying to resolve as we build Open PHACTS. We are trying to link data from various resurces, noting and resolving conflicts when we can, and curating as necessary with the ultimate intention that this information will flow out into the community and be picked up by the database hosts and addressed, fixed, challenged as appropriate.
I’ve been looking for a new example showing the challenges of data integration considering that in Open PHACTS at present we are integrating chemistry from three primary data sets (for now)… DrugBank, ChEBI, ChEMBL. So, let’s consider Fluvastatin. The usual challenges of trying to determine what the “correct” chemical structure representation is for the compound is an iterative loop but let’s see what we can find in our datasets as we iterate. I KNOW from 4 years of looking at chemistry on Wikipedia that the data quality for chemical compound representations is very good. So, starting there we find the Wikipedia record here. The DrugBox links to a number of records in other databases.
One of these is ChemSpider and it has the SAME representation. On ChEBI the representation is inconsistent with no defined stereochemistry (except the E- double bond). Since ChEBI is manually curated and the compound carries 3 stars this should be correct. There are two records LINKED from this ChEBI record.
On Drugbank the compound has INVERTED stereochemistry from that on ChemSPider and Wikipedia… WP and ChemSpider has 3R,5S while DrugBank has 3S,5R but it DOES say in the pharmacodynamics sectionb “It is prepared as a racemate of two erythro enantiomers of which the 3R,5S enantiomer exerts the pharmacologic effect. ” confirming that the 3R,5S form is the ACTIVE form.
ChEMBL matches Wikipedia and ChemSpider here.
So, to summarize what we get when we search for Fluvastatin
Stereo 3R,5S for Wikipedia, ChemSPider, ChEMBL
Stereo 3S,5R for DrugBank
No stereo for ChEBI
Welcome to the complexities of name-structure relationships. These are some of the challenges we need to deal with on Open PHACTS. Dailymed defines the sodium fluvastatin as “Fluvastatin sodium is sodium (±)-(3R*, 5S*, 6E)-7-[3-(p-fluorophenyl)-1-isopropylindol-2-yl]-3,5-dihydroxy-6-heptanoate” so the relative form….