What data do we trust now in the world of high-throughput screening and public compound databases
Posted by tony in Data Quality, Publications and Presentations on May 3, 2013
Let’s face it, the world of experimentation is fun, rewarding, challenging and depressing. Ok, that has been MY experience of the world of lab-based experimentation. I have made many discoveries and celebrated the true joy of being a lab-rat. Love it…always did. I remain polarized to this day by the number of hours I spent around large NMR magnets. No bias, but still polarized. But lab work is also challenging..sometimes not in a good way. Hours of “experiences”…read that as wasted time because of bad preparation on my part, or on a collaborator’s part, or bad chemicals, poorly calibrated equipment, the “person who came before me” scenario etc. Then there is the truly depressing that I experienced in some of my lab experience. Repeating work that someone else in my lab had done but the lack of a LIMS system didn’t allow me to know that; colleagues not checking materials shipped to them at a crucial stage of a synthesis and finding out what was ordered was not in the bottle (still their fault for not checking!); NMR solvents being really wet and causing nasty side effects on the compound; and, in my life….two magnet quenches in one day….a 500MHz and a 300Mhz. I shrugged and went home…
Some of my lab experiences were depressing but then I moved into cheminformatics. And in the past few years I have been depressed by the sad state of our public compound databases and the quality of data online. I have given dozens of presentations on the matter of data quality and these two blog posts are representative. We’ve also published on the issues of chemical compounds in the public databases and their correctness.
A Quality Alert and Call for Improved Curation of Public Chemistry Databases, A.J. Williams and S.Ekins, Drug Discovery Today, Link
Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation, A.J. Williams, S. Ekins, V. Tkachenko, Drug discovery today, 5, 2012 Link
This work was always focused on chemical compound structure representations and their matches with synonyms, names etc. Were they what their names said they should be was the common question. After a couple of years of working on this, and publishing with Sean Ekins, we wondered about the data quality of the measured experimental data, especially in the public domain assay screening databases, PubChem of course being the granddaddy of them all. While work could be done to confirm name-structure relationships in PubChem the experimental data is what it is, as submitted. How to check for the data quality of measured experimental data – reproducibility, comparison between labs etc. Not easy.
When the opportunity came to investigate the possibilities of errors in experimental data we didn’t quite expect the results we obtained. Rather than explain the work in detail I encourage you to read the paper, Open Access on PLOS One and available here. The article, entitled “Dispensing Processes Impact Apparent Biological Activity as Determined by Computational and Statistical Analyses” can be summarized as follows:
* Serial dilution and dispensing using pipette tips versus acoustic dispensing with direct dilution can differ by orders of magnitude with no correlation
* The resulting computational 3D pharmacophores generated from data from both acoustic and tip-based transfer differ significantly
* Traditional dispensing processes are another important source of error in high-throughput screening that impacts computational and statistical analyses.
Derek Lowe on the “In the Pipeline” blog made some strong comments in his post about the paper. He called it a “truly disturbing paper” and said “…people who’ve actually done a lot of biological assays may well feel a chill at the thought, because this is just the sort of you’re-kidding variable that can make a big difference.” And he’s right. There is cause for concern. First of all we don’t know enough yet from this very small study to understand what classes of compounds are going to exhibit this effect of pipette vs. acoustic discrepancy. Secondly, there is no meta data associated with the assay data itself (that we are aware of) that captures the distinction in the dispensing process and this paper SHOULD encourage screeners to include this info in their data.
The difference in the tip vs. acoustic dispensing are of course only one of many issues that can accompany data measurements for compounds. Other obvious issues include what’s the purity of what’s being screened – is it one component or many….is an impurity showing the response and in terms of modeling does the compound being screened match the suggested compound that was purchased/synthesized? Classify this as analytical data required prior to screening. Reproducibility and replicates, assay performance, decomposition in storage, etc. Check out the comments on Derek’s blog as responses to his post and clearly the screening community understand many of the challenges and have to deal with them.
Once upon a time someone from pharma made a couple of comments that I found very interesting….1) it likely costs more to store the screening data long term and support the informatics systems that it does to regenerate the data with new and improved assays on an ongoing basis. 2) As assay performance is understood, and assuming that materials are available it is likely appropriate to flush any data older than three years and remeasure. Certainly with this observation of pipette vs. acoustic bias data measured with tips may need to get flushed and remeasured with acoustic dispensing methods.
This work describes the observed differences between tips and acoustic methods and improved pharmacophore correlations. It highlights issues that likely exist in the data sitting in the assay screening databases (compounded with chemistry issues) and brings into focus the question of what can be trusted in the data. For sure not all the data is bad but how to separate good from bad and what of the models that can be derived? As Derek summarized in his blog post “How many other datasets are hosed up because of this effect? Now there’s an important question, and one that we’re not going to have an answer for any time soon.” And it’s depressing to think about how many data sets might be hosed….
There is an entire back story to this publication also…that is the challenges that we had getting the work published and the multiple rejections we had in the process. But Sean has told that story in detail here. There’s also the story about the press release …and how editorial control extended from the paper itself to the press release (described here), a situation that I found inappropriate, over-reaching and simply not right. But it happened anyways…..
So…data quality is an issue. It is confusing, hard to tease out and identify for all its complexities. But it’s science, it’s incremental learning and it’s trial by fire. And we have to wonder how many projects might have been burned simply by the dispensing processes
The future of scientific information & communication presented at the SUNY Potsdam Academic Festival
Posted by tony in AltMetrics, Chemical Database Service, ChemSpider Chemistry, Community Building, ImpactStory, Open Access Publishing, Open Science..all its forms on April 13, 2013
This is a LONG presentation….I talk about the “It’s All About Me” attitude that can positively feed science….we want to share OUR science, we want people to know about our opinions, our activities, our collaborators, we want to get funding, recognition and attribution. And why not…it can all be to the benefit of science.
This presentation was given at the SUNY Potsdam Academic Festival
The future of scientific information & communication
Our access to scientific information has changed in ways that were hardly imagined even by the early pioneers of the internet. The immense quantities of data and the array of tools available to search and analyze online content continues to expand while the pace of change does not appear to be slowing. While scientists now have access to the enormous capacities and capability of the internet the vast majority of scientific communication continues to be through peer-reviewed scientific journals. The measure of a scientist’s contribution is primarily represented by their publication profile and the citations to their published works and offers an incomplete view of their activities. However, we are at the beginning of a new revolution where the ability to communicate offers the opportunity to embrace new forms of publishing and where scientific participation and influence will be measured in new ways. This presentation will provide an overview of our new generation of “openness” in which open source, open standards, open access and open data are proliferating. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
Navigating scientific resources using wiki based resources
Posted by tony in ACS Meetings, AltMetrics, ChemSpider SyntheticPages, CINF, Data Quality, Division of Chemical Information, ImpactStory, SciDBs Wiki, ScientistsDB, SciMobile Apps Wiki, Wikipedia Chemistry, Wikipedia Services, XCITR on April 10, 2013
Presentation given at ACS New Orleans Spring Meeting
There is an overwhelming number of new resources for chemistry that would likely benefit both librarians and students in terms of improving access to data and information. While commercial solutions provided by an institution may be the primary resources there is now an enormous range of online tools, databases, resources, apps for mobile devices and, increasingly, wikis. This presentation will provide an overview of how wiki-based resources for scientists are developing and will introduce a number of developing wikis. These include wikis that are being used to teach chemistry to students as well as to source information about scientists, scientific databases and mobile apps.
Engaging students in publishing on the internet early in their careers
Posted by tony in ACS Meetings, AltMetrics, ChemSpider Chemistry, ChemSpider Syntheses, ChemSpider SyntheticPages, CINF, Division of Chemical Information, ImpactStory, Presentations, Publications and Presentations on April 10, 2013
Presentation given at ACS New Orleans Spring Meeting
As a result of the advent of internet technologies supporting participation on the internet via blogs, wikis and other social networking approaches, chemists now have an opportunity to contribute to the growing chemistry content on the web. As scientists an important skill to develop is the ability to succinctly report in a published format the details of scientific experimentation. The Royal Society of Chemistry provides a number of online systems to share chemistry data, the most well known of these being the ChemSpider database. In parallel the ChemSpider SyntheticPages (CSSP) platform is an online publishing platform for scientists, and especially students, to publish the details of chemical syntheses that they have performed. Using the rich capabilities of internet platforms, including the ability to display interactive spectral data and movies, CSSP is an ideal environment for students to publish their work, especially syntheses that might not support mainstream publication.
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
Posted by tony in ACS Meetings, AltMetrics, Chemical Database Service, ChemSpider Chemistry, PharmaSea on April 10, 2013
Presentation given at ACS New Orleans Spring Meeting
ChemSpider is one of the chemistry community’s primary public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data to many tens of websites and software applications at this point. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of solutions that it helps to enable. We will also discuss some of the future directions for the project that are envisaged and how we intend to continue expanding the impact for the platform.
Challenging cajoling and rewarding the community for their contributions to online chemistry
Posted by tony in ACS Meetings, ChemSpider Chemistry, ChemSpider Syntheses, ChemSpider SyntheticPages, CINF, Division of Chemical Information on April 10, 2013
Presentation given at ACS New Orleans Spring Meeting
Chemistry online is represented in various ways including publications, presentations, blog posts, wiki-contributions, data depositions, curations and annotations. Encouraging participation from the community to participate in and comment on the information delivered via these various formats would likely provide for a rich dialog exchange in some cases and improved data quality in others. At the Royal Society of Chemistry we have a number of platforms that are amenable to contribution. This presentation will provide an overview of our experiences in engaging the community to interact with our various forms of content and discuss new approaches we are utilizing to encourage crowdsourced participation.
RSC eScience heroes rewarded through Microsoft prize
Posted by tony in Uncategorized on April 5, 2013
A press release from today….I hope to say more after the week at ACS New Orleans….
Last year, the Royal Society of Chemistry’s Antony Williams was given the Microsoft Jim Gray eScience Award, recognising his pioneering contribution to ChemSpider, the free chemical structure database.
Now, Tony has chosen to pass on the $20,000 prize money to recognise eight colleagues who have made contributions to eScience through their own research.
Tony explains that, as the first non-academic to win the award, where traditionally the monetary prize has gone to the winner’s institution to invest in research, as he works in the publishing wing of his organisation, this needed a different approach.
Tony says: “I wanted to reward and recognise the efforts of the many people I’ve worked with and whose data, systems and services I have used over the years – every one of them has contributed in some ways to my own work in this area.
“In science you commonly stand on the shoulders of the giants that come before you. In eScience it is very possible to benefit from the efforts of others and implement, and I am fortunate to be able to take advantage of the brilliance of others.
“This is my giveaway in recognition of what they do and my thank you to them.”
One of Tony’s choices is Jean-Claude Bradley, Associate Professor at Drexel University in Philadelphia, who has pioneered work on Open Notebook Science. Tony Williams is deeply impressed with Jean-Claude’s approach, saying “I believe we will look back at what he’s doing as being hugely important – he really has his finger on the pulse of the future”.
Jean Claude feels equally strongly that a more open approach to science is vital for the acceleration of science and chemical projects. When asked about the importance of receiving the extra financial support, he said: “It’s fantastic, I’m very honoured by the award and certainly we don’t have any government funding for this so it’s very encouraging.
“Crowd funding and these kinds of initiatives, I think there are situations where it really works, so it definitely means a lot.
“ChemSpider itself has been key in the technical components of our work because we use ChemSpider IDs as our primary key for the molecules that we track and it has a lot of useful web services you can use, so working with Tony and the tools available from ChemSpider has been fantastic”.
Another recipient is Professor Martin Walker who, as well as teaching organic chemistry at the State University of New York at Potsdam has worked on improving the chemical compound pages on Wikipedia Chemistry. That work is “incredibly important”, according to Tony Williams. He adds: “Martin came in to work with the RSC for a year, so he’s someone I know and respect hugely. I also worked with him at Wikipedia Chemistry and he’s co-ordinated some incredibly important work”.
In turn, Martin Walker says “I admire Tony Williams immensely, and his award was certainly well deserved. I feel very honoured that he has chosen to recognise my work in this way.”
The award winners are recognised for their contributions as described in Tony’s own words below:
1) JC Bradley –Open Notebook Science: He coined the term Open Notebook Science and has set the vision for its applications to chemistry. http://www.drexel.edu/chemistry/contact/facultyDirectory/Jean-Claude%20Bradley/
2) Martin Walker – Wikipedia Chemistry. Leads and coordinates many efforts around Chemistry on Wikipedia contributing a significant number of the chemistry articles: http://www2.potsdam.edu/walkerma/
3) Bob Hanson – Jmol: Leads development of the Jmol applet, one of the most useful tools for chemistry on the web: http://www.stolaf.edu/people/hansonr/
4) Robert Lancashire – JSpecView: Project lead for the JSpecView Applet, an enabling component to allow for display of spectra data on the web: http://wwwchem.uwimona.edu.jm/chrl.html
5) Egon Willighagen – Open Source Chemistry: A contributor to the world of Open Code for Chemistry, especially to the Chemistry Development Kit and semantics for chemistry.: http://egonw.github.com/
6) Igor Pletnev – InChI: the InChI software manager and sole developer of all enhancements to the original InChI software code: http://analyt.chem.msu.ru/eng/preconcentration/pletnev/default.htm
7) Daniel Lowe – Nomenclature conversion and Open Reactions. Daniel managed the development of the OPSIN name-to-structure conversion software for 3 years and has contributed hundreds of thousands of chemical reactions to the world of Open Data. http://www-ucc.ch.cam.ac.uk/members/dl387
8) Peter Corbett – Text-mining: Peter developed the OSCAR3 open source package for chemistry text mining and also was the original developer of OPSIN. http://scholar.google.co.uk/citations?user=RSDcspMAAAAJ&hl=en
Press release online
http://www.rsc.org/AboutUs/News/PressReleases/2013/escience-heroes-award.asp
More information
Edwin Silvester
Media Relations Executive
Royal Society of Chemistry,
Thomas Graham House, Science Park,
Milton Road, Cambridge CB4 0WF, UK
Tel +44 (0)1223 432294, Mob +44 (0)7825 186342
Do we place too much trust in experts?
Over the weekend I spent about 4 hours making some videos, writing some short Powerpoints with some images from online sources and assembling some rather random terms and making up “Shtuff” for April 1st. The result was the video shown below…and if you want to get the full story watch it end to end. See if you UNDERSTAND what I was talking about and whether it was convincing enough to be believable. The results might be different if you are a chemist versus you being a friend or family without a scientific background.
The next day I came clean. It was an April Fools joke. I described in detail how I created the effect in a separate video.
I expected most chemists to call me on the scam very early but that didn’t happen. In fact a number of scientists I know quite well commented very positively on it. Some were in emails and some on the social media platforms. Only one person called me on it early. Maybe lots of other chemists spotted the problems. Maybe they didn’t watch the video through and just trusted me. Nevertheless I know that people I respect were ready to repeat the experiment with their kids, try it in class and so on. One person commented on Facebook what I thought would happen “I think it is more a reflection of the credibility you have with the science community …”. I think this is likely true…for my family, my friends and many others. I am “trusted” by people and seen to be credible….but on April 1st I am not to be trusted for sure!!!
I have many examples of where credible people are trusted when there are obvious flaws in their statements. In the Open Science/Open Access/Open Data arena I see trust simply granted to many so-called experts when their arguments are full of assumptions/declarations/opinions and not fact-based. I encourage more questions and less granted trust!
I also believe that the premise of my second video is correct…language in our specialty areas allows us to isolate, confuse and, for some, stay aloof. The brilliant people I know around me are able to tell their stories of science in language that non-specialists can understand. That’s a special skill that we should all work on. Except, of course, on April 1st, where it helps!
Kitchen Chemistry with my Kids and the fun of the BEMEWS Reaction – THIS WAS AN APRIL 1ST JOKE!!!
Posted by tony in Presentations on March 31, 2013
This weekend I spent some time with my boys teaching them a little more Kitchen Chemistry. I’ve been doing a whole series of kitchen chemistry experiments but this one was definitely a lot of fun. I got to teach them a little more about magnetism (as I did in this movie: Magnetism in the Human Body: Lessons for Ten Year Olds).
This time I was teaching them all about rare-earth magnets and how they could be used in a solution of Borax and Bemews catalyst to form an extended hydration network and grow blue water balls. Admittedly these hydration spheres centered around the magnetic ions don’t persists for very long but nevertheless the experiment is a fun one for kids!
The BEMEWS reaction using magnetic centered dendritic network growth of hydration spheres
ACS New Orleans Special Events
Posted by tony in ACS Meetings on March 26, 2013
If you are going to ACS New Orleans in just a couple of weeks then I hope to see you there. Stop by the booth and say hi or attend one of our many presentations listed below. There is a lot going on in our eScience team and we will be reporting on some of our latest efforts at the conference.
If you are arriving in on Saturday we would love you to attend the CINF gathering to discuss the future of the division. How to access, use, validate and integrate Chemical Information is such an important part of our everyday lives we want to ensure that the activities of the division support your needs and we are hosting this dinner in order to garner your input and engage you in discussion. The dinner is on Saturday from 6:30-9:30pm at Calcasieu Private Dining, 930 Tchoupitoulas St, New Orleans. Please contact me via email at tony27587 AT gmail DOT com to confirm you wish to attend so we have space.
We will also have a special guest at the ACS: Science Comedian Brian Malow. He will be the speaker at the CINF Luncheon AND will have a one hour session on communicating science. Details below. Please contact me directly if you want a ticket for the CINF Luncheon!
Monday, April 8
- Division of Chemical Information Featured Presentation, 4:30-5:30pm, MCC 352
- Brian Malow – “Science comedian’s guide to communicating science
- Sponsored by the ACS Division of Chemical Information
Tuesday, April 9
- CINF Luncheon (Ticketed Event – Contact Division Chair, Tony Williams), 12:00 – 1:30 pm, MCC Room R08
- Speaker: Brian Malow – “Science Comedy”
- Sponsored exclusively by the Royal Society of Chemistry
Presentations that members of our eScience team are involved with.
1) PAPER ID: 22619 PAPER TITLE: “ChemSpider: Disseminating data and enabling an abundance of chemistry platforms” DIVISION: CINF: Division of Chemical Information SESSION: Public Databases Serving the Chemistry Community
2) PAPER ID: 18784 PAPER TITLE: “Engaging students in publishing on the internet early in their careers” DIVISION: CHED: Division of Chemical Education SESSION: Increasing Student Comprehension and Retention in the Undergraduate Organic or Inorganic Curriculum
3) PAPER ID: 21634 PAPER TITLE: “Navigating scientific resources using wiki-based resources” DIVISION: CINF: Division of Chemical Information SESSION: Library Cafes, Intellectual Commons and Virtual Services, Oh My! Charting New Routes for Users into Research Libraries
4) PAPER ID: 19389 PAPER TITLE: “Challenging, cajoling and rewarding the community for their contributions to online chemistry” DIVISION: CINF: Division of Chemical Information SESSION: Scholarly Communication: New Models, New Media, New Metrics
5) PAPER ID: 18792 PAPER TITLE: “Data enhancing the RSC Archive” DIVISION: CINF: Division of Chemical Information SESSION: Scholarly Communication: New Models, New Media, New Metrics
6) PAPER ID: 19947 PAPER TITLE: “RSC chemical validation and standardization platform: A potential path to quality-conscious databases” DIVISION: CINF: Division of Chemical Information SESSION: Public Databases Serving the Chemistry Community
7) PAPER ID: 16409 PAPER TITLE: “Carbohydrate structure representation and public chemistry databases” DIVISION: CARB: Division of Carbohydrate Chemistry SESSION: Current Topics in Glycoscience
8) PAPER ID: 19458 PAPER TITLE: “Cheminformatics career at the Royal Society of Chemistry, UK” DIVISION: CINF: Division of Chemical Information SESSION: Food for Thought: Alternative Careers in Chemistry
9) PAPER ID: 20185 PAPER TITLE: “ChemSpider reactions: Delivering a free community resource of chemical syntheses” DIVISION: CINF: Division of Chemical Information SESSION: Public Databases Serving the Chemistry Community
10) PAPER ID: 16427 PAPER TITLE: “Evolving with our community: The RSC’s approach to the challenges and opportunities of scientific communication” DIVISION: CINF: Division of Chemical Information SESSION: Scholarly Communication: New Models, New Media, New Metrics
11) PAPER ID: 21004 PAPER TITLE: “Open PHACTS: Meaningful linking of preclinical drug discovery knowledge” DIVISION: CINF: Division of Chemical Information SESSION: Linking Bioinformatic Data and Cheminformatic Data
12) PAPER ID: 13382 PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates” DIVISION: CINF: Division of Chemical Information SESSION: Public Databases Serving the Chemistry Community
13) PAPER ID: 21524 PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and tools” DIVISION: CINF: Division of Chemical Information SESSION: Public Databases Serving the Chemistry Community
14) PAPER ID: 13433 PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and statistical analyses” DIVISION: CINF: Division of Chemical Information SESSION: Advances in Visualizing and Analyzing Biomolecular Screening Data













