Tag Archives: Collaborative Computational Technologies for Biomedical Research

Encouraging Collaboration in Washington as a Hub for Chemistry Databases

On August 25/26 I will be attending the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry. I will have the opportunity to spend time with people I appreciate for the contributions they are making to chemistry: Martin Walker, JC Bradley, Andy Lang, Markus Sitzmann, Ann Richard, Frank Switzer, Evan Bolton, Marc Zimmermann, Wolf Ihlenfeldt, Steve Heller, John Overington, Noel O’Boyle, and many others. It is surely going to be an excellent meeting. The agenda is given here.

Some of the people listed above are associated with “Washington-based databases”. Databases that are developed in or around Washington by government-funded organizations – the FDA, NIH, NCBI/NLM, NCI, NIST. There are also other government funded databases, non-Washington-based, represented – EPA and CDC. If you are not sure what all those three letter acronyms are then here you go.

FDA – Food and Drug Administration

NIH – National Institutes of Health

NCBI/NLM – National Center for Biotechnology Information/National Library of Medicine

NCI – National Cancer Institute

EPA – Environmental Protection Agency

CDC – Center of Disease Control

NIST – National Institute of Standards and Technology

One organization with a chemistry database conspicuous by its absence is the NCGC data collection contained in the NPC Browser. I’ve blogged a lot about this one on this blog.

NCGC – NIH Chemical Genomics Center

I am hoping to get to talk to some members of the team if they attend the meeting though.

There will be a LOT of government databases represented at this meeting. I have experience with many of the databases provided by these institutions. The DSSTox database is one of the most highly curated databases based on my review of the data. The NCI resolver is an excellent resource with good quality data in terms of the accuracy of name-structure relationships.

The various databases are developed independently of each other. True, some of the databases contain contents from some of the other databases but, as far as I can tell, there is not much collaboration in terms of coordinated curation of data. What would it be like if each of these organizations participated in a roundtable discussion to agree to a process by which to collaboratively validate and curate the data, once and for all? Maybe this meeting can catalyze such a discussion. I would encourage the organizations to take advantage of other data sources that can share their data – ChEBI/ChEMBL is one example! If these various groups coordinate their work then the result could be a massively improved quality dataset to share across the databases and across the community. If this work was done then the group that assembled the NPC Browser would likely have a lot less work to do in terms of assembling the data. The various database providers should certainly have provided clean, curated data for many of the top known drugs. While working on a manuscript reviewing the quality of public domain chemistry databases I assembled a table of 25 of the top selling drugs in the US and checked the data quality in the NPC Browser relative to a gold standard set. The assembly of the data will be discussed in its entirety  in a later publication.

25 of the Top Selling Drugs in the USA - Data Quality in the NPC Browser

The errors listed in the table are:

1 Correct skeleton, No stereochemistry
2 Correct skeleton, Missing stereochemistry
3 Correct skeleton, Incorrect stereochemistry
4 Single component of multicomponent structure
5 Multiple components for single component structure
6 No structure returned based on Name Search
7 Incorrect skeleton
8 Multiple structures based on name search


Clearly there are a lot of errors in the structures associated with 25 of the best selling drugs on the US market. These should be the easy ones to get right as they are so well known!!! Collaboration between the domains top database providers would have helped, almost certainly. This would not necessarily be an issue of meshing technologies but agreeing on a common goal to have the highest quality data available. Since the government puts so much money into the development of these databases it would be appropriate to have some oversight and push for aligning efforts. Collaboration is essential!

With that in mind…a shameless pointer to how Sean Ekins, Maggie Hupcey and I BELIEVE in the need for collaboration…our book. If we can encourage others in the government chemistry databases to adopt active collaborative approaches wonderful things could happen.

Collaborative Computational Technologies for Biomedical Research



Tags: , ,

A YouTube Overview of Our Book: Collaborative Computational Technologies for Biomedical Research

This movie provides an overview of the book “Collaborative Computational Technologies for Biomedical Research” edited by Sean Ekins, Maggie Hupcey and Antony Williams and published by Wiley and Sons. All of the authors either have extensive backgrounds in computational software for biomedical research or have done wet lab research for drug discovery. Many have worked in software companies, pharmaceutical companies or consulting companies and have the appropriate skills to produce an excellent overview of present activities in the area of Collaborative Computational Technologies for Biomedical Research.

Leave a comment

Posted by on November 9, 2010 in Book Reviews, General Communications


Tags: , ,

Future Book Collaborative Computational Technologies for Biomedical Research

For the past few months I have been working with Sean Ekins and Maggie Hupcey to edit a book entitled “Collaborative Computational Technologies for Biomedical Research” and to be published by Wiley next year. It’s been a work of passion for all three of us as we all believe that collaborative computational technologies will make a major impact on biomedical research. This book represents a point in time. We are working at a time when technologies are moving so quickly that in a couple of years parts of the future vision of the book will likely already be in place. SOme of the concepts about what could be will certainly have grown in scope and the world of open data, open science and open source will have made even more significant impacts on the Life Sciences. This was an exciting project. It represents the exciting shifts in collaboration happening every day. We hope you’ll be interested in reading it when it releases next year. The outline of the book, its chapters and its authors are listed below.


1. The Need for Collaborative Technologies in Drug Discovery
Chris L. Waller, Ramesh V. Durvasula and Nick Lynch

2. Collaborative Innovation: the Essential Foundation of Scientific Discovery
Robert Porter Lynch

3. Models for Collaborations and Computational Biology
Shawnmarie Mayrand-Chung, Gabriela Cohen-Freue, and Zsuzsanna Hollander

4. Precompetitive Collaborations in the Pharmaceutical Industry
Jackie Hunter

5. Collaborations in Chemistry
Sean Ekins, Antony J. Williams and Christina K. Pikas

6. Consistent Patterns in Large Scale Collaboration
Robin W. Spencer

7. Collaborations Between Chemists and Biologists
Victor J. Hruby

8. Ethics of Collaboration
Richard J. McGowan, Matthew K. McGowan and Garrett J. McGowan

9 Intellectual Property Aspects of Collaboration
John Wilbanks


10. Scientific Networking and Collaborations
Edward D. Zanders

11. Cancer Commons: Biomedicine in the Internet Age
Jeff Shrager, Jay M. Tenenbaum, and Michael Travers

12. Collaborative Development of Large-Scale Biomedical Ontologies
Tania Tudorache and Mark A. Musen

13. Standards for Collaborative Computational Technologies for Biomedical Research
Sean Ekins, Antony J. Williams and Maggie A.Z. Hupcey

14. Collaborative Systems Biology: Open Source, Open Data, and Cloud Computing Brian Pratt

15. Eight Years Using GRIDS for Life Sciences
Vincent Breton, Lydia Maigne, David Sarramia and David Hill

16. Enabling Precompetitive Translational Research – A Case Study
Sándor Szalma

17. Collaboration in the Cancer Research Community: The cancer Biomedical Informatics Grid (caBIG)
George A. Komatsoulis

18. Leveraging Information Technology for Collaboration in Clinical Trials
O.K. Baek


19. The Evolution of Electronic Laboratory Notebooks
Keith T. Taylor

20. Collaborative Tools to Accelerate Neglected Disease Research: the Open Source Drug Discovery Model
Anshu Bhardwaj, Vinod Scaria, Zakir Thomas, Santosh Adayikkoth, Open Source Drug Discovery (OSDD) Consortium and Samir K. Brahmachari

21. Pioneering Use of the Cloud for Development of the Collaborative Drug Discovery (CDD) Database
Sean Ekins, Moses M. Hohman and Barry A. Bunin

22. Chemspider: a Platform for Crowdsourced Collaboration to Curate Data Derived From Public Compound Databases
Antony J. Williams

23. Collaborative Based Bioinformatics Applications
Brian D. Halligan

24. Collaborative Cheminformatics Applications
Rajarshi Guha, Ola Spjuth and Egon Willighagen


25. Collaboration Using Open Notebook Science in Academia
Jean-Claude Bradley, Andrew S.I.D. Lang, Steve Koch and Cameron Neylon

26. Collaboration and the Semantic Web
Christine Chichester and Barend Mons

27. A Collaborative Visual Analytics Environment for Imaging Genetics
Zhiyu He, Kevin Ponto and Falko Kuester

28. Current and Future Challenges for Collaborative Computational Technologies for the Life Sciences
Antony J. Williams, Renée J.G. Arnold, Cameron Neylon, Robin Spencer, Stephan Schürer and Sean Ekins

Leave a comment

Posted by on October 11, 2010 in Book Reviews, General Communications


Tags: , , , , ,