I am a fan of Altmetrics. At least in concept. But I starting to get very concerned with both the tools used to measure them and what the “numbers” are expected to indicate. We would expect that a high “number” in an Altmetric.com “donut” would be indicative, in some way, of the relative importance or “impact” of that article. One would hope it at least points to how well read the article is, whether the readers like the science and the potential for the article to, for example, move forward understanding or proliferate data into further usage. I am not sure this is true…at least for some of the articles I am involved with.
Let’s take for example the recent Zika Virus article that Sean Ekins led. The F1000 site gives us some stats in regards to Views and Downloads and the Metrics shows the Altmetric stats. I would assume that 48 DOWNLOADers would have at least some of them reading the article. Some of the VIEWers are likely to have read it and maybe printed it. For the Altmetric stats the 33 tweets are likely people pointing to the article and because of the way I use Twitter I am going to suggest that Tweets are less indicative of the number of readers of the article. There is a definition on the Altmetric site regarding how Twitter stats are compiled.
If we use the Altmetric Bookmarklet we can navigate to the page with a score
The score of “41” is essentially the sum of bloggers, tweets, Facebook posts etc. summarized below (1+1+1+33+1+3+1 for being on Altmetric.com???)
When I asked F1000Research via Twitter why they don’t show the “number” I appreciated their answer. I AGREE with their sentiment.
Yesterday I received an email about our Journal of Cheminformatics article “Ambiguity of non-systematic chemical identifiers within and between small-molecule databases“, part of which is shown below.
On the actual Journal of Cheminformatics page it says there have been 1444 accesses (not 2216 as cited in the email).
Also the Altmetric score is 8. So somewhere between 1400-2200 accesses (and it is safe to assume some proportion actual read it!). But it has a low Altmetric score of 8. This is versus an Altmetric score of >40 for the Zika Virus paper and a lot less accesses and probably a lot of the altmetrics for that article don’t necessarily indicate reads of the article as they are Tweets, many of them from the authors out to the world.
Using PlumX I am extremely disappointed regarding what it reflects about the JChemInf article! Only 10 HTML Views versus the 1400-2200 accesses reported above, and only 7 readers and 1 save! UGH. But 13 Tweets are noted so it seems so I would expect at least an Altmetric.com score of 13 or 14, instead of the 8 marked on the article?
I also tried to sign into ImpactStory to check stats but got a “Uh oh, looks like we’ve got a system error…feel free to let us know, and we’ll fix it.” message so will report back on that.
Altmetrics should be maturing now to a point where the metrics of reads, accesses, downloads should be fed into some overall metric. I think that reads/accesses/downloads should carry more weight than a Tweet in terms of impact of an article? At least if someone read it, whether they agree with it or not they are MORE aware of the content than if someone simply shared the link to an article, that then didn’t get read? The platforms themselves are so desync’ed in terms of the various numbers themselves that we must wonder how are things so badly broken? I would imagine that stats gathered in someway through CrossRef or ORCID will ultimately help this to mature but until then treat them all with a level of suspicion. I believe that AltMetrics will be an important part of helping to define impact for an article. But there is still a long way to go I’m afraid….
The needs for chemistry standards, database tools and data curation at the chemical-biology interface
This presentation was given at the Society of Laboratory Automation and Screening in San Diego, California on January 25th 2016.
The needs for chemistry standards, database tools and data curation at the chemical-biology interface
This presentation will highlight known challenges with the production of high quality chemical databases and outline recent efforts made to address these challenges. Specific examples will be provided illustrating these challenges within the U.S. Environmental Protection Agency (EPA) Computational Toxicology Program. This includes consolidating EPA’s ACToR and DSSTox databases, augmenting computed properties and list search features, and introducing quality metrics to assess confidence in chemical structure assignments across hundreds of thousands of chemical substance records. The past decade has seen enormous investments in the generation and release of data from studies of chemicals and their toxicological effects. There is, however, commonly little concern given to provenance and, more generally, to the quality of the data. The presentation will emphasize the importance of rigorous data review procedures, progress in web-based public access to accurate chemical data sets for use in predictive modeling, and the benefits that these efforts will deliver to toxicologists to embrace the “Big Data” era.
This abstract does not necessarily represent the views of the U.S. Environmental Protection Agency
The presentation is available from the EPA’s Science Inventory site as a PDF file here.
Scientists from EPA, NTP and NCATS have used high-throughput screening (HTS) assays to evaluate the potential health effects of thousands of chemicals. The Transform Tox Testing Challenge: Innovating for Metabolism is calling on innovative thinkers to find new ways to incorporate physiological levels of chemical metabolism into HTS assays. Since current HTS assays do not fully incorporate chemical metabolism, they may miss chemicals that are metabolized to a more toxic form. Adding metabolic competence to HTS assays will help researchers more accurately assess chemical effects and better protect human health.
A new paper hit Nature Chemistry today “Reversible Bergman cyclization by atomic manipulation” (The paper will be featured on the cover of the March Issue). I have so much appreciation for what these scientists are doing. Selfishly I want to continue to applaud them for the breakthrough science that they continue to produce. I have never met the “IBM molecular microscopy” team (my chosen label) but I have had a chance to work with them on two separate occasions. One high profile one was on Olympicene, a fun story reminisced here: Olympicene From Concept to Completion. It was a lot of fun to work with scientists who found the work interesting and in reality it is NOT just a marketing story for RSC as some people mocked at the time, including some of my own colleagues! In fact, if you look at the number of articles that I have now linked (and continue to add to) on my Kudos page you will see a LOT of publications came out of the work (Kudos’ed Olympicene article plus linked articles) so not just “fun science”. In reality science is fun and real utility and understanding can come out of researching fun science, clearly.
The other chance I had to work with the team was on one of my personal interests: Structure Elucidation by NMR and the applications of Computer-Assisted Structure Elucidation (CASE) software/algorithms. The work “A Combined Atomic Force Microscopy and Computational Approach for the Structural Elucidation of Breitfussin A and B: Highly Modified Halogenated Dipeptides from Thuiaria breitfussi” combined CASE-based approaches with single molecule microscopy to elucidate new structures.
Now the team is demonstrates a reversible Bergman cyclization for the first time using atomic manipulation and verification of the products by non-contact atomic force microscopy with atomic resolution. I will let the movie below tell the story and reference you to the original paper. FASCINATING WORK. Congrats to all. How many reactions will now come under the scrutiny and validation of the team now? We will see…
My blog has been fairly inactive for the past few months, driven primarily by my move from working on cheminformatics at the Royal Society of Chemistry to working at the National Center for Computational Toxicology at the Environmental Protection Agency. While I stopped working on ChemSpider about 18 months before I left RSC (to focus on the developing RSC Data Repository) my interest and focus on data quality and a long-standing interest in “accuracy in chemical structure representations” has never dwindled. At the EPA-NCCT we are very focused on working to produce high quality chemical structure databases, following on from the work of my colleague Ann Richard who initiated work on DSSTox over a decade ago.
It was therefore with great interest that I became aware of the confusion in regards to the chemical structure of BIA-10-2474, a drug that has attracted a lot of interest because of a clinical trial with negative outcomes. I am entering the story late compared to my many time collaborators and friends Sean Ekins, Chris Southan and ALex Clark, but more about their work later. The news to date is best summarized at Derek’s In the Pipeline blog and on David Kroll’s post on Forbes.
Based on my previous history and work with helping to curate chemical structures on Wikipedia (starting one Christmas in 2008) my experience would be that Wikipedia is a GOOD PLACE to source high quality structures, especially after the work invested in curating chemical data over the years. The first structure for BIA-10-2474 that was reported on Wikipedia is shown below.
On January 16th Chris performed his usually thorough examination of structure integrity and links to public sources (he is a master in this domain!) but commented specifically ” The molecular identity of BIA-10-2474 can only be formally verified directly by BIAL or indirectly from regulatory documentation they may have submitted” as the chemical structure itself was inferred from the name.
Nevertheless my friends Sean Ekins and Alex Clark were already investigating what OPEN MODELS may be able to predict about the chemical: See here, here and here. You should be impressed regarding what is possible when running a molecular structure through several Bayesian models in Alex’s mobile app called PolyPharma!
By January 21st Chris was commenting that the structure had changed and highlighted the extract from what was exposed by Figaro and listing the chemical name: 3-(1-(cyclohexyl(methyl)carbamoyl)-1H-imidazol-4-yl)pyridine 1-oxide. Want to know what that name means as a structure? Take the name “3-(1-(cyclohexyl(methyl)carbamoyl)-1H-imidazol-4-yl)pyridine 1-oxide” and paste it into the free online service OPSIN. The results are shown below.
That structure has now found its way to Wikipedia (updated on the 21st January – check out the edits between the two forms of the article here).
Sean Ekins has maintained a running series of blog posts here. Using a stack of openly accessible algorithms and websites Sean has now produced a whole series of predictions for the “final molecule”. Chris Southan has also continued to expand his work and I direct you to his latest blogpost for more information. Nice stuff Chris.
It took days following the news starting to show up regarding the results of the drug trial before the chemical structure was actually identified (i.e. the structure was blinded). How much work, how much confusion was created by having the drug structures blind? We have to imagine that the authorities had faster access to the details!
It is understandable that companies keep their chemical structures hidden. Patents are intentionally obfuscating (with a compound going into a trial commonly hidden among hundreds if not tens of thousands of chemicals that could be enumerated from a Markush structure). Until then Chris Southan will continue to educate the world about how competitive intelligence investigations.
This is a talk I gave at the 5th Brazilian Conference on Natural Products as part of my “spare time” activities and to remain engaged with my passion of NMR, structure elucidation and computational spectroscopy applications
Integrating Cheminformatics and Spectroscopy to Elucidate the Structures of Natural Products
The structure elucidation of natural product structures from analytical data, specifically NMR and MS, remains a major challenge. With an enormous palette of NMR experiments to choose from, and supported by breakthrough technologies in hardware, the generation of high quality data to enable even the most complex of natural product structures to be determined is no longer the major hurdle. The challenge is in the analysis of the data. We are in a new era in terms of approaches to structure elucidation: one where computers, databases, and a synergy between scientists and algorithms can offer an accelerated path forward. Software tools are capable of digesting spectroscopic data to elucidate extremely complex natural products. Scientists can now elucidate chemical structures utilizing multinuclear chemical shift data, correlation data from an array of 2D NMR experiments and utilize existing data sets for the purpose of dereplication and computer-assisted structure elucidation. With the explosion of online data especially, in public databases such as PubChem and ChemSpider, many tens of millions of chemical structures are available to seed fragment databases to include in the elucidation process. This presentation will provide an overview of how cheminformatics and chemical databases have been brought together to assist in the identification of natural products. It will include an examination of the state-of-the-art developments in Computer-Assisted Structure Elucidation.
This is a presentation I gave at North Carolina State University hosted by Denis Fourches.
Data integration and building a profile for yourself as an online scientist
Many of us nowadays invest significant amounts of time in sharing our activities and opinions with friends and family via social networking tools. However, despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our future careers. We are being indexed and exposed on the internet via our publications, presentations and data. We also have many more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation will provide an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. Many of these can ultimately contribute to the developing measures of you as a scientist as identified in the new world of alternative metrics. Participating offers a great opportunity to develop a scientific profile within the community and may ultimately be very beneficial, especially to scientists early in their career.
My talk at ACS Boston: Value of the mediawiki platform for providing content to the chemistry community
At this time, and in a culture where online access is now an imperative, Wikipedia has become the definitive encyclopedia. In terms of its support for chemistry it is rich in many encyclopedic pages including named reactions, chemical and drug pages, articles about chemists, and many other forms of chemistry related information. Wikipedia is hosted on Mediawiki, an open source platform that can be utilized by anybody as the basis of their own hosted content collection. Mediawiki has been used as a collaborative environment by a number of chemists to create As a general contribution to the community Mediawiki has been used to create a number of resources that have become very popular with the chemistry community. These include VIPEr to support inorganic chemistry, ChemWiki as an online textbook and other educational resources and a Chemical Information Wikibook. Mediawiki has also been used by the author to host open source collections of data including scientists, scientific databases and mobile apps for science: the ScientistsDB, SciDBs and SciMobileApps wikis. This presentation will provide an overview of some of the chemistry resources that presently exist and celebrate the major contributions that Wikipedia and Mediawiki have made to the collaborative dissemination of chemistry.
ACS Boston: The driving needs for analytical data exchange standards and the potential impacts on the chemical sciences
This presentation was given at the ACS Boston meeting with the following abstract
Analytical science underpins so many different types of chemistry that it is clearly indispensable. Nuclear Magnetic Resonance and infrared spectroscopy, mass spectrometry and chromatography, and a myriad of other forms of analytical science are easily available to scientists today, commonly in open access walk up labs. While instrumentation is now compact and highly flexible, and the controlling software is both powerful and easy to use, significant challenges remain in terms of the management and integration of various forms of analytical data and, more importantly, the exchange of data between scientists. In general the reporting of data in peer-reviewed journals is limited to electronic supplementary information in the form of PDF files or, occasionally in the form of webpages. Many of the strengths in analytical data resides in the ability to database diverse data types and interrogate later performing searches based on metadata, spectral features and related chemical structure information. The need for file format export and conversions from binary file formats associated with the majority of analytical instrumentation remains a major objective in the field. While file formats such as JCAMP and NetCDF have enabled data exchange for a number of years the requirement for more advanced formats (such as AnIML and mzML) has continued. This presentation will review existing activities in the development of exchangeable formats and progress in utilizing existing formats for the delivery of reusable analytical data to the community.
Today is my last day of employment for the Royal Society of Chemistry. It will be almost six years since I joined RSC when ChemSpider was acquired. While ChemSpider was initially a “hobby project” and attempt to create a disruption in terms of access to chemistry data, crowdsourced contribution and data validation, it has gone from strength to strength and now serves ca. 40,000 unique users a day from around the world. It won three awards in the first few months that we joined RSC and was catalytic in RSC winning three grants to allow us to participate in the Open PHACTS project, the PharmaSea project and become the host of the UK National Chemical Database Service. Based on the feedback I have received over the years ChemSpider is much-loved and appreciated as a contribution to the scientific community and is recognized as one of the key players in the free chemistry resources arena. I am proud to have been associated with it.
We also got to set up the ChemSpider SyntheticPages micropublishing site and tried to get the community sharing syntheses that would likely not make it into mainstream papers but were still of value to science.
During my six years at RSC I have been involved with many discussions regarding the following areas of work, study and research and how they would benefit publishing, the society and, of course, the chemistry community at large. The list includes, in particularly random order:
- Chemistry databases – both commercial and free- and how to best mesh, commercialize and license data
- Data quality in publications and databases and development of tools for data validation
- Open Data, Open Access and Open Notebook Science
- Text-mining of the RSC archive to extract & mark up compounds, reactions, property data and analytical data.
- The potential of semantic web applications to scientific publishing
- Encouraging the use of Open Identifiers – especially ORCID and InChI
- The future of Micropublishing in the chemical sciences
- Analytical data and building an open spectral database for the community
- Social networking approaches to build online profiles – especially for young scientists
There are many, many more things of course but these are the big ones and, for me, bring clarity to what my interests are – chemistry data and making it available to the appropriate communities. It is with this in mind that I am excited to join the Environmental Protection Agency next week in the National Center of Computational Toxicology.
With every move forward into a new job we leave behind our old one. And I leave RSC with some sadness that I am leaving and excitement for the new opportunities. I have had the chance to work with so many good people at RSC, to engage with collaborators such as ACD/Labs, Mestre, NextMove, EBI, ChemAxon, Accelrys (as they were then), iChemLabs, Dotmatics and on and on. Apologies if you are not named but the list is very long. Thanks to everyone for your support, encouragement and opportunities to engage. It has been a blast.
And for everyone at RSC who catered to my strange diet of potatoes only…so long, and thanks for all the spuds.