Archive for category Nuclear magnetic resonance

Providing Access to a Million NMR Spectra via the web

This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CHED Division symposium

Providing Access to a Million NMR Spectra via the web

Antony Williams, Alexey Pshenichnov, Peter Corbett, Daniel Lowe, Carlos Coba

Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s Learn Chemistry. These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.

 

No Comments

Teaching analytical spectroscopy using online spectroscopic data #ACSsanfran

My first talk of three on August 11th 2014 at the ACS San Francisco meeting

Teaching analytical spectroscopy using online spectroscopic data

The teaching of spectroscopy can be a complex and challenging task. The Royal Society of Chemistry has been developing online resources for a number of years that provide access to analytical data as well as interactive quizzes and challenge sets. The RSC data repository houses over 250,000 spectra at this time including mass spectrometry, NMR and IR data and these are utilized to provide online games to test students capabilities, to underpin the SpectraSchool  training website and to produce source data for students and teachers alike to use in their teaching and self-training efforts. This presentation will provide an overview of RSC resources that can be used to teach spectroscopy using our online data and tools.

 

1 Comment

Applying RSC cheminformatics skills to support the PharmaSea project at #ACSsanfran

This is the first presentation I gave at the ACS meeting in San Francisco on Sunday morning (August 8th) in the CINF Natural Products session.

Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSea project

The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.

No Comments

Being ignored during the review process and how I would address issues in a paper today

MOST people who are reading this blog post have likely performed peer review over the years. I have reviewed a lot of manuscripts over the years. It has changed a lot over the past decade in many ways. A couple of examples of how things have changed for me

1) More requests to review papers – and I increasingly turn down requests because they are from journals I have never heard of (some may call them “predatory publishers”), some are in areas for which I have no expertise (e.g. electrical engineering), and sometimes because I simply don’t have time.

2) I have seen papers I have reviewed show up essentially untouched in other journals (no edits and simply reformatted) and commonly these “refused papers” are accepted into what I deem to be “lower quality” publications.

Of course over the past ten years I’ve also had a lot of papers go through peer review for myself and my co-authors. This experience has also been very interesting, if not entertaining. Some examples:

1) I have experienced the third reviewer where an editor has held up a manuscript or demanded changes to match some of their own expectations while other reviewers were publish as is.

2) I have had the request to shorten excellent manuscripts to help with “page limits”….in the electronic age???

3) I have been on the receiving end of non-scientific reviews that have blocked a paper. My personal favorite “Mobile apps are a fad of the youth.”

My best story of peer review, and an example where modern technologies would have been so enabling at the time, is as follows.

I was asked to review a paper regarding the performance of Carbon-13 NMR prediction for this paper. A slice of the abstract says

“Further we compare the neural network predictions to those of a wide variety of other 13C chemical shift prediction tools including incremental methods (CHEMDRAW, SPECTOOL), quantum chemical calculation (GAUSSIAN, COSMOS), and HOSE code fragment-based prediction (SPECINFO, ACD/CNMR, PREDICTIT NMR) for the 47 13C-NMR shifts of Taxol, a natural product including many structural features of organic substances. The smallest standard deviations were achieved here with the neural network (1.3 ppm) and SPECINFO (1.0 ppm).”

This was an important time for me as this paper was comparing various NMR predictors and comparing the performance based on ONE chemical structure. And while any one point of comparison is up for discussion there were 47 shifts so you could argue it is a bigger data set. One of the programs under review was a PRODUCT that I managed at ACD/Labs, CNMR Predictor. Therefore I clearly had a concern as, essentially, the success of this product was partly responsible for my income. Any comparison that made the software look poor in performance was an issue. Was this a conflict of interest…maybe…but I judge myself to still be objective.

Table 3 listed the experimental shifts as well as the predicted shifts from the different algorithms and the size of the accompanying circle/ellipse was a visual indicator of a large difference between experimental and predicted. We will assume that all experimental assignments are correct and that there are no transcription errors between the predicted values from each algorithm and input into the table. A piece of Table 3 is shown below.

A portion of Table 3

A portion of Table 3

 

I kind of pride myself on being a little bit of a stickler for detail when it comes to reviewing data quality. Those of you who read this blog will know that. As I reviewed the data I was a little puzzled by the magnitude of the errors for certain Carbon nuclei, specifically for Carbons 23 and 27.

The ACD/CNMR 6.0 predicted values are in the right hand column. The size of the circles indicates size of errors

The ACD/CNMR 6.0 predicted values are in the right hand column. The size of the circles indicates size of errors – I suspected that 132.8 and 142.7 ppm had been switched. That led to a deeper analysis.

What was interesting to me was that the experimental shifts for 23 and 27 were 142.0, 133.2 ppm respectively yet the predicted shifts were 132.8, 142.7 ppm respectively. It struck me that they looked like they were switched. This was what drew my attention to reviewing the data in more detail. I will cut a long story short but I redrew the molecule of Taxol as input into the same version of software that was used for the publication and got a DIFFERENT answer than that reported. I was able to distinguish WHY it was different…it was down to the orientation of a bond in the input molecule that was input by the reporting authors and this made the CNMR prediction worse.

I reported this detail to the editors in a detailed letter and recommended the manuscript for publication with the caveat that the numbers for the column representing CNMR 6.0 be edited to accurately reflect the performance of the algorithm and provide the details. I was shocked to see the manuscript published later WITHOUT any of the edits made for the numbers and inaccurately representing the performance of the algorithm. I contacted the editors and after a couple of exchanges received quite a dressing down that the editor overseeing the manuscript refused to get between a commercial concern and reported science.

What does this mean? That software companies don’t do science and only academics do? I have similar experience of my colleagues in industry being treated with bias relative to my colleagues in academia. I believe my friends in industry, commercial concerns and academia can all be objective scientists….and after all, doesn’t academia teach the chemists that come out to industry and the commercial software world? These are my experiences…I welcome any comments you may have about the bias. BUT, back to the story…

The manuscript was published in June 2002 and as product manager I had to deal with questions around algorithmic performance for many months because “the peer-review literature said…”. This was NOT the only instance of a situation like this as a couple of years later it was reported that ACD/CNMR could not handle stereochemistry only to determine with the scientist who wrote the paper that he had thrown a software switch that affected his results. Software can be tricky and unfortunately the best performance can often come through the hands of those that write the software. Sad but true in many cases.

In August 2004 we published an addendum with one of the original authors regarding the work describing the entire situation in detail. It was over two years from the original publication to the final addendum. I do not believe there was any malicious intent on behalf of the authors of the original manuscript but that was in the days where the only place to issue a rebuttal was in the journal and we could not get editorial support to do it. How would it happen today if a paper came out that was suspicious. There are a myriad number of tools available now….

A Comparison of Errors - Left Column is Original Paper and Right Hand Side is Rebuttal

A Comparison of Errors – Left Column is Original Paper and Right Hand Side is Rebuttal. Notice the SMALL circles for the final paper – SMALL errors

Yes, I would blog the story here, as I am doing now. Yes I would express concern at the situation on Twitter with the hope of gaining redress. I would likely tell the story in a Slideshare presentation and make a narrated movie and make it available via an embed in the Slideshare presentation on my account. I would hope that the publisher nowadays would at least allow me to add a comment to the article but I do  understand that this comment would likely be monitored and mediated and they may choose not to expose it to the readers. I like the implementation on PLoS and have used it on one of our articles previously.

Could I maybe make use of a technology like Kudos that I have started using. I have reported it on this blog already here. I certainly could not claim the ORIGINAL article and start associating information with it regarding the performance of the algorithms…and that is a shame. But MAYBE in the future Kudos would consider letting OTHER people make comments and associate information/data with an article on Kudos. Risky? Maybe. However, I can claim the rebuttal that I was a co-author on and start associating information with that….certainly the original paper and ultimately linking to this blog. In fact, in the future is a rebuttal going to be a manuscript that I publish out on something like Figshare, grab a DOI there and maybe ask Kudos to treat that as a published rebuttal? Peer review of that rebuttal could then happen as comments on Figshare and Kudos directly and maybe in the future Kudos Views and Altmetric measures of that becomes a measure of the importance. We live in very interesting times as these technologies expand, mesh and integrate.

No Comments

The involvement of RSC with PharmaSea and a new antibiotics search to focus on the sea bed

A nice article went out today on the BBC News site regarding the work that the PharmaSea project would be undertaking…to find new classes of antibiotics deep in the ocean.

BBC

The RSC is involved in the project as a result of our skills in hosting chemicals in a publicly accessible database as well as integrating data. ChemSpider also has a rich collection of natural products already in the database and we are developing approaches to segregate the collection for use by the project. We also have the RSC Natural Product Updates database that we have already integrated with ChemSpider. There are various other aspects of work that we will be doing to support the project including developing approaches to perform “dereplication” – determining whether or not a particular chemical has been previously isolated/identified/elucidated, in this case by searching the ChemSpider database using spectral features (NMR shifts, multiplicities, mass, fragment ions etc). If the actual compound itself is not identified then dereplication approaches can certainly hint at a particular chemical class and substructures. We do NOT have spectral data for the majority of compounds in ChemSpider so spectral prediction approaches will be useful in this regard. We will be working with some very skilled scientists who have experience with the structure elucidation of novel natural products and will have the opportunity to collaborate with ACD/Labs, a company I worked for for over a decade on their Computer-Assisted Structure Elucidation software program, Structure Elucidator, one of the tools that will be used in this project.

Its going to be an exciting project, I am REALLY looking forward to it and heck, if we can help identify new classes of antibiotics we might contribute to some of the challenges we have ahead of us!!!!

PharmaSea

No Comments

Our article Structure Revision of Asperjinone Using Computer-Assisted Structure Elucidation Methods

Our article “Structure Revision of Asperjinone Using Computer-Assisted Structure Elucidation Methods”, is now available on the Journal of Natural Products website here.

ACS_paper

This was a long time coming…almost a year in the review process and iterations. I continue to see the reports from many publishers about how fast articles are published but my experience in 2012 is that it is many months past the published averages! The pr