Category Archives: Purpose and People

There are a number of people involved with ChemSpider and this will continue to grow. They will be introduced through this category. The purpose(s) behind ChemSpider remain flexible and will be discussed under this category.

Honored to be a Recipient of the Jim Gray eScience Award from Microsoft Research

Last night in Chicago I was awarded the Jim Gray eScience Award. I didn’t know Jim personally but I know I benefit from the fruits of his work. Before Tony Hey gave me the award he played a video about the previous award winners. To be recognized for my contributions and to join scientists of the caliber of the previous winners was, to say the least, very emotional. My entire career has been focused on doing what I thought was the right thing for the role I was charged with. And when I didn’t want the role I was in I would move on. That’s migrated me through various roles in science from lab manager in academia, in industry, to start-up cheminformatics company product manager, through marketing, through sales, to community website for chemistry, to where I am today at RSC, a publisher. If I had been asked to map out my career path there is no way I would get to here…but which of us would be able to really?

Last night I presented on “The Possibilities and Pitfalls of Internet-Based Chemical Data”. I talked about how much data I have generated in the lab over the years that is now lost. And how we can change this moving forward for the existing generation of scientists. I talked about the history of ChemSpider from hobby-project to present day as one of the web’s primary sites for chemists. I talked about how scientists should PARTICIPATE in annotating and curating data online…how data sites specifically should enable commenting to capture issues. I talked about the measure of scientists and how efforts including ORCID and ImpactStory will be important to deal with the impact and notability of scientists. I hope I was able to share my view that while technology will continue to improve in terms of allowing us to contribute that it is personal choice to make a difference that is crucial in terms of correcting errors, annotating data and continuing the journey of creating improved resources for the chemistry community (and of course other branches of science).

I also announced our intention for RSC to create a Global Chemistry Hub (a topic for a separate post) and to “data enable the RSC archive”…extracting chemicals, reactions, data etc from our archive going back to the 1840s. We do not have all of the technologies, the processes or the approaches yet defined. But we have the intent and the courage to go for it, learning as we go and producing beneficial outcomes in an iterative manner. It’s an exciting time for the RSC cheminformatics team and it is my privilege to work alongside a great team of individuals to create a step change in terms of how we manage and deliver chemistry data to the community.

I have had a lot of trusted advisers over the years and last night I acknowledged a list of those closest to me in recent years. They include: Jean-Claude Bradley, Sean Ekins, Lee Harland, Gary Martin and Martin Walker. The closest to me however is Valery Tkachenko. I was happy that Valery was able to be at the conference with me. So much of what has been achieved to data with ChemSpider (as well as MANY projects we worked on together while at ACD/Labs) rests squarely on his shoulders. The future technical implementation of the cheminformatics projects we are undertaking at RSC is under his guiding hand. I am glad to have such a great “partner in crime”….

My thanks to Microsoft Research, to the judges for selecting me for the award and to the community who has chosen to embrace some of the fruits of my work. I am leaving Chicago proud, tired and looking forward to making an ever bigger impact with some of our new projects.




Tags: , , , , , , ,

Open Notebook Science and One Future for Scientific Research

A few weeks ago I was invited to give a presentation to the Board of Directors at Burroughs Wellcome. I was very interested in taking this opportunity to discuss my views on Open Science, Open Notebook Science, Open Data etc with this group of very esteemed scientists. However, it turned out it clashed with a planned vacation. Since my friend and frequent co-author Sean Ekins is an evangelist for open science for drug discovery, improving data quality, and Mobile Apps, and since we think alike on so many levels, I asked Sean whether he’d want to give the presentation. And, always welcoming adventure Sean jumped at the chance to present.

As it turned out Hurricane Rina resulted in us cancelling our vacation so I ended up attending the presentation with Sean. While we had bounced the slides between each other prior to the presentation Sean did a terrific job as the presenter and we had some very interesting questions regarding what is standing in the way of open science, especially around chemistry databases (of compounds), what are good examples of bioinformatics projects that are successful, and whether there are “risks” inherent to Open Science, especially in regards to what is shared online in public compound databases. I thoroughly enjoyed the meeting, short as it was and am glad that we were given the opportunity.

Sean has eloquently outlined the nature of the presentation at his site (he is Collabchem) and the presentation is below for your comments and review. I recommend that you check out Sean’s other presentations too!



Comments from @UntangledHealth re. Data Quality

I have recently blogged about the quality of data in the NCGC Dataset that was made available with the NPC Browser. Jeff Harris from the UntangledHealth Blog , has made some interesting comments about how this carries over to healthcare and I am posting them below as I thought they were interesting enough to be exposed to this community in case you missed them in the comments…

“I want to thank you for including those of us on the front lines as practitioners and patients in your thoughtful research. Whether you recognize it or not (I am sure you do!) the profound discovery of issues in data integrity within the life sciences translates all the way down to the level of a therapeutic outcome such as blood pressure and ultimately what could emerge as what we tend to call a therapeutic misadventure (read my blog: My first experience with Computer Assisted Clinical Decision Support) On the Untangled Health Blog, it dates back to 1982.

I am sure you are aware of the current pressures (in both carrot and stick form) from our government to deploy electronic health records which include the elements of clinical and administrative data exchange between providers, clinics, patients and various registries. The Office of the National Coordinator for HIT is accountable for managing multiple advisory committees who are setting the requirements for the technology we ultimately use. The Health Information Technology Standards Panel has done an excellent job over the last several years developing use cases, standard nomenclature and message structures.

What I find alarming in your work is the fact that we continue to have issues with the credibility of the foundation of our communication; the source data.

• In your world of life science innovation these issues may sort themselves out during primary investigation phases but our data are best thought of as meta-data which are used by humans AND computers for critical decisions relating to individual patient management and population health targeting.
• For example: we utilize HL7 (Health Level 7) as a structure to embed data such as CCD (Continuity of Care Records, NCPDP standards for prescription data and HIPAA X12 standards for the electronic representation of claims between entities. Lately we have also started new standards for Quality Reporting (QRDA) and Geocoded Population Summary Exchanges (GIPSE). We have also chosen vocabulary standards including: SNOMED CT (Systematized Nomenclature of Medicine — Clinical Terms), used for clinical problems and procedures; UNII (Unique Ingredient Identifier), used for ingredient allergies. LOINC (Logical Observation Identifiers Names and Codes), used for Lab tests. facilitate the exchange and pooling of clinical results for clinical care, outcomes management, and research by providing a set of universal codes and names to identify laboratory and other clinical observations and UCUM (Unified Code for Units of Measure), used for units of measure. A code system intended to include all units of measures being contemporarily used in international science, engineering, and business. The purpose is to facilitate unambiguous electronic communication of quantities together with their units. The focus is on electronic communication, as opposed to communication between humans. A typical application of The Unified Code for Units of Measure are electronic data interchange (EDI) protocols.
Herein lies the rub: We still have not sorted out from an industrial perspective how rules that impact patient treatment, hence safety will be sorted by reliability of the source record. For example: an insurance company attempting to identify individuals with hypertension might use their X12 transaction sets which include ICD codes (International classification for diseases) and CPT codes (procedural codes for payment); then target those individuals for disease management by a special team of patient advocates. The reliability of X12 transactions is always a debate since practitioners who are paid for their services based on the complexity of the encounter will often add every diagnostic code (ICD) that applies to the patient to maximize reimbursement, we are working on these ethical issues but they persist.
I personally received a call from my insurance company nurse after she had enrolled me in a depression management program because their Pharmacy Benefits Management Company recorded that I was taking Cymbalta as coded in their NCPDP data. Cymbalta as a single identifier for depression does not work from an algorithmic perspective since it is also used for diabetic neuropathy (the reason for my treatment). The nurse and I had a good laugh over this. Ideally a face to face claims encounter with at least two occurrences of the ICD or DSM for depression should be included in the equation prior to contacting the patient. In this case, the insurance group had made a big mistake and if I had less of a sense of humor it would have ended differently.
So, we are working on algorithms, yet I can assure you that the logical code used can be quite different between manufactures.
To add another issue: Having worked in the industry, I have had experiences where legacy HL7 data had been customized to use an empty field assigned for one clinical parameter and replaced it with another. This was fine until the next generation of employees came along and tried to run reports on the lab data using basic HL7 standards. Kind of like discovering that your average patient has an average blood glucose value of October 31st, 2001.
What you are unveiling at the molecular level runs rampant in our industry at a time when we are forcing technology into the market that IMHO still requires a lot of validation as opposed to deploying beta product.
The The Office of the National Coordinator for HIT Strategic Plan for 20011 through 2015 has the following objectives:
Goal I: Achieve Adoption and Information Exchange through Meaningful Use of Health IT
Goal II: Improve Care, Improve Population Health, and Reduce Health Care Costs through the Use of Health IT
Goal III: Inspire Confidence and Trust in Health IT
Goal IV: Empower Individuals with Health IT to Improve their Health and the Health Care System
You can predict the problems with Goal III if we do not perform stellar validation and reliability testing across all manufacturers. I doubt that this is possible given the number of players in the market place.”

Leave a comment

Posted by on May 15, 2011 in Purpose and People


Taverna Workflows Hook-up to ChemSpider Services for Metabolomics

Yesterday I announced the availability of the MassSpec web services for ChemSpider and, less than 24 hours later, I am happy to announce that it is already integrated. Egon Willighagen, one of the members of our Advisory Group, has already reported on his integration to ChemSpider with the intention of speeding up metabolomics analysis. He has used Taverna, a workflow and pipelining tool to set up his workflows. What’s good to see is how easy this was for him to do …well, I assume it was easy since he didn’t need to consult with us. We released the MassSpec web service and voila, he was integrated.

This is what is happening with our other web services too. A number of organizations are now integrated to ChemSpider and using the services on a daily basis.


Curators Perform Heroic Duties. They Should be Celebrated!

Recently there was a commentary made about the “highly curated data” on Wikipedia. To me curators are heroes. They are detail oriented, committed to the cause and simply “care”.

As a result of reading that post you saw me go off and check on Taxol, post a few comments and come out the other end of the work with a “more highly curated record” on Wikipedia.

Then I commented on there are better ways to ensure the quality of structure drawings than redrawing them…specifically dictionary look-up and optical structure recognition.

I don’t mind being taken to task on my opinions. As my late father said…”Opinions are like nostrils, everybody has them”. Okay, the body cavity was a little more south but you get the point. However, this opinion stirred me…

“If you wish to spend your life recording typos in chemical documents, I hope it is fulfilling.”

Now, sometimes when you are stirred emotionally, it helps to sit down and think about it.


So, I’ve thought about it… and I’m happy about where I’ve ended up.

My life IS fulfilling. I might need therapy for this particular passion but I DO actually enjoy checking typos in “documents” – of course our conversations are about chemical documents (structures) and I DO confess I like it. Why? I care about Quality.

When I see an acknowledgment that Wikipedia is highly curated and I know I have contributed to that I have a certain pride to having contributed to community science. Those of us cleaning up the historical record for others to benefit are doing a lot of the grunt work that others talk about being necessary and espouse the need for platforms to do so. You can throw a palette of colors and a brush on a floor but someone has to pick it up and do something with it. Platforms, tools, visions are great…we need thinkers but we also need doers. Doers are important and necessary and people who find typos in chemical documents likely do find it fulfilling. I’m a thinker and a doer. until I have experienced the challenges of curating historical records I do not feel I am sufficiently immersed in the challenge. Oh…there’s another nostril (opinion).

So, who are my heroes? Some of them in this domain are:

1) Barrie Walker, ChemSpider Advisory Group member and our KING OF QUALITY.

2) Ann Richards, EPA, founder of the DSSTox effort and quality guru extraordinaire. Ann and her team have taken on the task of assembling, from various sources (and of various quality levels), a public resource of incredible value to the Tox community. This paper explains in detail. With her fine eye for detail, commitment to detail (checking CAS numbers to the digit, stereochemistry of each bond and the accuracy of the chemical names) her databases are likely the cleanest and most highly curated databases from any government labs (no intention to offend others here and if your DB is as good as DSSTox you are my heroes too!) In particular I acknowledge Marti Wolf from Ann’s lab who has spent thousands of hours assembling data, “recording typos in chemical documents” and correcting them to the benefit of the community.

3) People like Peter Corbett. He really seems to care about what’s in a database and the quality of what’s there. He is discovering these issues by observation and checking. His careful eye, clearly necessary for the development of OSCAR, makes him a hero (I look forward to meeting him!)

4) The people I worked with at ACD/Labs in the database compilation office are heroes. This group of 10s of individuals over the years, have manually curated 100s of thousands of structures and associated properties (Physchem parameters, NMR shifts, name-structure pairs). They have done it with a fine eye. THEIR efforts were the basis of what led to industry leading NMR prediction algorithms which were used recently to provide feedback to the Blue Obelisk team member, Christoph Steinbeck, to help clean up errors in the NMRSHIFTDB. While others were attacking the open data effort those of us concerned with the details helped curate the data.

5) The curators at CAS, at MDL (now Symyx), at GVKBio, and in software houses and labs all over the world who manually curate data, and, from their experience, build robots to help their processes and improve the data for all.

For all of you who wish to spend your life recording typos in chemical documents, it is likely very fulfilling if you care about quality.

I find it fulfilling. It’s a necessary part of understanding the problem. Quality is hard to define. But, we’ve been challenged on the quality of our science on ChemSpider enough. We’ve been challenged for sodium chloride dimers and shown it’s valid science. We’ve been challenged for logP prediction of Calcium Carbonate and had an industry great acknowledge our attention to detail. We’ve been challenged on inorganic chemistry and compared ourselves to others.

We Monkeys have been told to close the gates of ChemZoo. We didn’t. Instead we are doing great things for the community I hope. We have opened up a series of services that the Open Access world likes (specifically the Blue obelisk players..), we are donating our database to PubChem shortly, and we are working with some of the best people on our advisory group to satiate their needs. It’s pretty damn fulfilling.

* I will acknowledge that the comment “If you wish to spend your life recording typos in chemical documents, I hope it is fulfilling.” is removed from the context of the entire post. So read the post. Then read all the others I’ve mentioned. I made my interpretation of the comment based on the ongoing flavor. Maybe my nostril was clogged…

Leave a comment

Posted by on October 4, 2007 in Purpose and People, Quality and Content


Who Is Behind ChemSpider?

The ChemSpider team is a small group of passionate individuals. We all have day jobs. ChemSpider is innovated, extended and maintained during evenings and weekends, with the support of friends, family, collaborators and chemists wanting to make a difference.Here’s a disclaimer regarding one member of the team from the recent Press Release when ChemSpider integrated ACD/Labs properties “Disclaimer: ChemZoo, Inc., is founded by Dr. Antony Williams, who is also serving as VP and Chief Science Officer for ACD/Labs. ChemZoo and ChemSpider services are not affiliated with Advanced Chemistry Development, Inc., (ACD/Labs) and are developed independently of ACD/Labs initiatives.”

Antony Williams…that’s me. To be clear, I work at ACD/Labs. I have just celebrated 10 years of employment with them, and proud to do so.

ChemSpider is a passion project, one that the team involved in producing really cares about. It is NOT our “job”. It is something we want to do. We have a lot of the necessary skills to make a valuable contribution to the chemistry community so we are going to do what we can to make a difference. We will likely make mistakes along the way…we’ve already had many successes. Having worked at a commercial chemistry software company for 10 years it is clear that we will not make everyone happy – we will have our evangelists and our critics. We will navigate the community feedback and provide value where we can with the intention of making a difference. ChemSpider went live on March 24 th …only one month ago. To date we have successfully performed thousands of searches and transactions for chemists around the world on a beta release. That is a result we are proud of. What is to come in the next month will only add to the value of ChemSpider. Watch this space…

Leave a comment

Posted by on April 28, 2007 in Purpose and People