Archive for category Nuclear magnetic resonance
This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CHED Division symposium
Providing Access to a Million NMR Spectra via the web
Antony Williams, Alexey Pshenichnov, Peter Corbett, Daniel Lowe, Carlos Coba
Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s Learn Chemistry. These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.
My first talk of three on August 11th 2014 at the ACS San Francisco meeting
Teaching analytical spectroscopy using online spectroscopic data
The teaching of spectroscopy can be a complex and challenging task. The Royal Society of Chemistry has been developing online resources for a number of years that provide access to analytical data as well as interactive quizzes and challenge sets. The RSC data repository houses over 250,000 spectra at this time including mass spectrometry, NMR and IR data and these are utilized to provide online games to test students capabilities, to underpin the SpectraSchool training website and to produce source data for students and teachers alike to use in their teaching and self-training efforts. This presentation will provide an overview of RSC resources that can be used to teach spectroscopy using our online data and tools.
This is the first presentation I gave at the ACS meeting in San Francisco on Sunday morning (August 8th) in the CINF Natural Products session.
Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSea project
The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.
MOST people who are reading this blog post have likely performed peer review over the years. I have reviewed a lot of manuscripts over the years. It has changed a lot over the past decade in many ways. A couple of examples of how things have changed for me
1) More requests to review papers – and I increasingly turn down requests because they are from journals I have never heard of (some may call them “predatory publishers”), some are in areas for which I have no expertise (e.g. electrical engineering), and sometimes because I simply don’t have time.
2) I have seen papers I have reviewed show up essentially untouched in other journals (no edits and simply reformatted) and commonly these “refused papers” are accepted into what I deem to be “lower quality” publications.
Of course over the past ten years I’ve also had a lot of papers go through peer review for myself and my co-authors. This experience has also been very interesting, if not entertaining. Some examples:
1) I have experienced the third reviewer where an editor has held up a manuscript or demanded changes to match some of their own expectations while other reviewers were publish as is.
2) I have had the request to shorten excellent manuscripts to help with “page limits”….in the electronic age???
3) I have been on the receiving end of non-scientific reviews that have blocked a paper. My personal favorite “Mobile apps are a fad of the youth.”
My best story of peer review, and an example where modern technologies would have been so enabling at the time, is as follows.
I was asked to review a paper regarding the performance of Carbon-13 NMR prediction for this paper. A slice of the abstract says
“Further we compare the neural network predictions to those of a wide variety of other 13C chemical shift prediction tools including incremental methods (CHEMDRAW, SPECTOOL), quantum chemical calculation (GAUSSIAN, COSMOS), and HOSE code fragment-based prediction (SPECINFO, ACD/CNMR, PREDICTIT NMR) for the 47 13C-NMR shifts of Taxol, a natural product including many structural features of organic substances. The smallest standard deviations were achieved here with the neural network (1.3 ppm) and SPECINFO (1.0 ppm).”
This was an important time for me as this paper was comparing various NMR predictors and comparing the performance based on ONE chemical structure. And while any one point of comparison is up for discussion there were 47 shifts so you could argue it is a bigger data set. One of the programs under review was a PRODUCT that I managed at ACD/Labs, CNMR Predictor. Therefore I clearly had a concern as, essentially, the success of this product was partly responsible for my income. Any comparison that made the software look poor in performance was an issue. Was this a conflict of interest…maybe…but I judge myself to still be objective.
Table 3 listed the experimental shifts as well as the predicted shifts from the different algorithms and the size of the accompanying circle/ellipse was a visual indicator of a large difference between experimental and predicted. We will assume that all experimental assignments are correct and that there are no transcription errors between the predicted values from each algorithm and input into the table. A piece of Table 3 is shown below.
I kind of pride myself on being a little bit of a stickler for detail when it comes to reviewing data quality. Those of you who read this blog will know that. As I reviewed the data I was a little puzzled by the magnitude of the errors for certain Carbon nuclei, specifically for Carbons 23 and 27.
What was interesting to me was that the experimental shifts for 23 and 27 were 142.0, 133.2 ppm respectively yet the predicted shifts were 132.8, 142.7 ppm respectively. It struck me that they looked like they were switched. This was what drew my attention to reviewing the data in more detail. I will cut a long story short but I redrew the molecule of Taxol as input into the same version of software that was used for the publication and got a DIFFERENT answer than that reported. I was able to distinguish WHY it was different…it was down to the orientation of a bond in the input molecule that was input by the reporting authors and this made the CNMR prediction worse.
I reported this detail to the editors in a detailed letter and recommended the manuscript for publication with the caveat that the numbers for the column representing CNMR 6.0 be edited to accurately reflect the performance of the algorithm and provide the details. I was shocked to see the manuscript published later WITHOUT any of the edits made for the numbers and inaccurately representing the performance of the algorithm. I contacted the editors and after a couple of exchanges received quite a dressing down that the editor overseeing the manuscript refused to get between a commercial concern and reported science.
What does this mean? That software companies don’t do science and only academics do? I have similar experience of my colleagues in industry being treated with bias relative to my colleagues in academia. I believe my friends in industry, commercial concerns and academia can all be objective scientists….and after all, doesn’t academia teach the chemists that come out to industry and the commercial software world? These are my experiences…I welcome any comments you may have about the bias. BUT, back to the story…
The manuscript was published in June 2002 and as product manager I had to deal with questions around algorithmic performance for many months because “the peer-review literature said…”. This was NOT the only instance of a situation like this as a couple of years later it was reported that ACD/CNMR could not handle stereochemistry only to determine with the scientist who wrote the paper that he had thrown a software switch that affected his results. Software can be tricky and unfortunately the best performance can often come through the hands of those that write the software. Sad but true in many cases.
In August 2004 we published an addendum with one of the original authors regarding the work describing the entire situation in detail. It was over two years from the original publication to the final addendum. I do not believe there was any malicious intent on behalf of the authors of the original manuscript but that was in the days where the only place to issue a rebuttal was in the journal and we could not get editorial support to do it. How would it happen today if a paper came out that was suspicious. There are a myriad number of tools available now….
Yes, I would blog the story here, as I am doing now. Yes I would express concern at the situation on Twitter with the hope of gaining redress. I would likely tell the story in a Slideshare presentation and make a narrated movie and make it available via an embed in the Slideshare presentation on my account. I would hope that the publisher nowadays would at least allow me to add a comment to the article but I do understand that this comment would likely be monitored and mediated and they may choose not to expose it to the readers. I like the implementation on PLoS and have used it on one of our articles previously.
Could I maybe make use of a technology like Kudos that I have started using. I have reported it on this blog already here. I certainly could not claim the ORIGINAL article and start associating information with it regarding the performance of the algorithms…and that is a shame. But MAYBE in the future Kudos would consider letting OTHER people make comments and associate information/data with an article on Kudos. Risky? Maybe. However, I can claim the rebuttal that I was a co-author on and start associating information with that….certainly the original paper and ultimately linking to this blog. In fact, in the future is a rebuttal going to be a manuscript that I publish out on something like Figshare, grab a DOI there and maybe ask Kudos to treat that as a published rebuttal? Peer review of that rebuttal could then happen as comments on Figshare and Kudos directly and maybe in the future Kudos Views and Altmetric measures of that becomes a measure of the importance. We live in very interesting times as these technologies expand, mesh and integrate.
The RSC is involved in the project as a result of our skills in hosting chemicals in a publicly accessible database as well as integrating data. ChemSpider also has a rich collection of natural products already in the database and we are developing approaches to segregate the collection for use by the project. We also have the RSC Natural Product Updates database that we have already integrated with ChemSpider. There are various other aspects of work that we will be doing to support the project including developing approaches to perform “dereplication” – determining whether or not a particular chemical has been previously isolated/identified/elucidated, in this case by searching the ChemSpider database using spectral features (NMR shifts, multiplicities, mass, fragment ions etc). If the actual compound itself is not identified then dereplication approaches can certainly hint at a particular chemical class and substructures. We do NOT have spectral data for the majority of compounds in ChemSpider so spectral prediction approaches will be useful in this regard. We will be working with some very skilled scientists who have experience with the structure elucidation of novel natural products and will have the opportunity to collaborate with ACD/Labs, a company I worked for for over a decade on their Computer-Assisted Structure Elucidation software program, Structure Elucidator, one of the tools that will be used in this project.
Its going to be an exciting project, I am REALLY looking forward to it and heck, if we can help identify new classes of antibiotics we might contribute to some of the challenges we have ahead of us!!!!
Our article “Structure Revision of Asperjinone Using Computer-Assisted Structure Elucidation Methods”, is now available on the Journal of Natural Products website here.
This was a long time coming…almost a year in the review process and iterations. I continue to see the reports from many publishers about how fast articles are published but my experience in 2012 is that it is many months past the published averages! The primary hurdles appear to be the speediness of reviewers and the willingness of editors to pursue them! When I ask for updates the general response is “We will contact the reviewers…”
The URL http://pubs.acs.org/articlesonrequest/AOR-IKzXYYSAAQCbXVAc4Fva can be used for us to distribute 50 e-prints of the article so please feel free to grab one. Details below…
“As part of the ACS Articles on Request e-prints service, ACS authors may choose to e-mail or post this link on their website to distribute up to 50 free e-prints of their final published articles to interested colleagues during the first 12 months of publication. After that 12-month period, any author’s article may be accessed without restriction via the same author-directed link that appears above. The link seamlessly directs subscribers to the full text version of the article on the ACS Publications website.”
When writing a publication how many of us conduct complete literature searches? For those of us who do not have access to Scifinder how are we finding our literature? Probably through Google Scholar? When I write a paper I admit that some of my searches may be less than complete but I do try and stay informed in regards to what is going on in my domain. VERY occasionally I get feedback from reviewers pointing me to references that they feel I either ignored or was unaware of. Many times they are co-authored by the reviewer themselves…and it is pretty easy to figure out who the reviewers are 🙂
Today I received an email in my inbox about the latest article in the Journal of Cheminformatics. It is OMG: Open Molecule Generator. The article is here. The abstract opens with “Computer Assisted Structure Elucidation has been used for decades to discover the chemical structure of unknown compounds. In this work we introduce the first open source structure generator, Open Molecule Generator (OMG), which for a given elemental composition produces all non-isomorphic chemical structures that match that elemental composition.”
Having been involved with Computer-Assisted Structure Elucidation for many years, having co-authored a book about it (here) and probably the definitive review article from the past 5 years (here) I would have assumed that our work would have been referenced. I was surprised to see that our work was not referenced while other CASE systems were. Articles we’ve issued over the past few years are below. I’ve gathered them here to point the authors to in case they want to reference any of them and missed them in the literatire search.
I am taking advantage of the fact that I can leave comments on the provisional manuscript here (what a great capability!!!) and will let them know about this list. it would be good to compare the performance of the OMG with the structure generator under ACD/Structure Elucidator sometime….
1) M.E. Elyashberg, K.A. Blinov and A.J. Williams, Computer-aided Molecular Structure Elucidation on the Basis of 1D and 2D NMR Spectra, Applied Magnetic Resonance, (May 2000)
2) K.A. Blinov, M.E. Elyashberg, S.G. Molodtsov, A.J. Williams and E.R. Martirosian, An Expert System for Automated Structure Elucidation Utilizing 1H-1H, 13C-1H, and 15N-1H 2D NMR correlations, Fresenius J. Anal. Chem., 369, 709 (2001)
3) G.E. Martin, C.E. Hadden, D.J. Russell, B.D. Kaluzny, J.E. Guido, W.K. Duholke, B.A. Stiemsma, T.J. Thamann, R.C. Crouch, K.A. Blinov, M.E. Elyashberg, E.R. Martirosian, S.G. Molodtsov, A.J. Williams, P.L. Schiff, Jr., Identification of Degradants of a Complex Alkaloid Using NMR Cryoprobe Technology and ACD/Structure Elucidator, J. Heterocyclic Chem. 39, 1241 (2002)
4) M.E. Elyashberg, K.A. Blinov, A.J. Williams, E.R. Martirosian, S.G. Molodtsov, Application of a New Expert System for the Structure Elucidation of Natural Products from the 1D and 2D NMR Data, J. Nat. Prod., 65, 693 (2002)
5) G . E. Martin, C .E. Hadden, D. J. Russell, B. D. Kaluzny, J. E. Guido, W. K. Duholke, B. A. Stiemsma, T. J. Thamann, R. C. Crouch, K. A. Blinov, M. E. Elyashberg, E. R. Martirosian, S. G. Molodotsov, A. J. Williams, and P. L. Schiff, Jr., Identification of Degradants of a Complex Alkaloid Using NMR Cryoprobe Technology and ACD/Structure Elucidator, J. Heterocyclic Chem., 39 1241-1250 (2002).
6) K. A. Blinov, D. Carlson, M. E. Elyashberg, G. E. Martin, E. R. Martirosian, S. Molodtsov, and A. J. Williams, Computer-Assisted Structure Elucidation of Natural Products with Limited 2D NMR Data: Applications of the StrucEluc System, Magn. Reson. Chem., 41, 359-372 (2003).
7) G. E. Martin, D. J. Russell, K. A. Blinov, M. E. Elyashberg and A. J. Williams, Applications and Advances in Cryogenic NMR Probes & Computer-Assisted Structure Elucidation. Ann. Magn. Reson., 2, 1-31 (2003)
8) K. Blinov, M. Elyashberg, E. R. Martirosian, S. G. Molodtsov, A. J. Williams, M. H. M. Sharaf, P. L. Schiff, Jr., R. C. Crouch, G. E. Martin, C. E. Hadden, and J. E. Guido, “Quindolinocryptotackieine: The Elucidation of a Novel Indoloquinoline Alkaloid Structure through the Use of Computer-Assisted Structure Elucidation and 2D-NMR,” Magn. Reson. Chem., 41, 577-584 (2003).
9) M. E. Elyashberg, K. A. Blinov, E. R. Martirosian, S. G. Molodtsov, A. J. Williams, and G. E. Martin, Automated Structure Elucidation – The Benefits of a Symbiotic Relationship between the Spectroscopist and the Expert System, J. Heterocyclic Chem., 40, 1017-1029 (2003).
10) M. E. Elyashberg, K. A. Blinov, A. J. Williams, S. G. Molodtsov, G. E. Martin, and E. R. Martirosian, Structure Elucidator: A Versatile Expert System for Molecular Structure Elucidation from 1D and 2D NMR Data and Molecular Fragments, J. Chem. Inf. Comput. Sci. 44, 771-792 (2004).
11) S. G. Molodtsov, M. E. Elyashberg, K. A. Blinov, A. J. Williams, E. E. Martirosian, G. E. Martin, and B. Lefebvre. Structure Elucidation from 2D NMR Spectra Using the StrucEluc Expert System: Detection and Removal of Contradictions in the Data. J. Chem. Inf. Comp. Sci., 44, 1737-1751 (2004)
12) G. J. Sharman, I. C. Jones, M. P. Parnell, M. C. Willis, M. F. Mahon, D. V. Carlson, A. J. Williams, M. E. Elyashberg, K. A. Blinov, S. G. Molodtsov. Automated structure elucidation of two products in a reaction of an a,b-unsaturated pyruvate. Magn. Reson. Chem. 42, 567 (2004)
13) Y. D. Smurnyy, M. E. Elyashberg, K. A. Blinov, B. A. Lefebvre, G. E. Martin, and A. J. Williams, Computer-Aided Determination of Relative Stereochemistry and 3D Models of Complex Organic Molecules from 2D NMR Spectra, Tetrahedron, 61, 9980-9989 (2005).
14) M. E. Elyashberg, K. A. Blinov, A. J. Williams, S. G. Molodtsov, and G. E. Martin, Are Deterministic Expert Systems for Computer-Assisted Structure Elucidation Obsolete? J. Chem. Inf. Model. 46, 1643-1656 (2006).
15) M. E. Elyashberg, K. A. Blinov, S. G. Molodtsov, A. J. Williams, and G. E. Martin, Fuzzy Structure Generation: An Efficient New Tool for Computer-Aided Structure Elucidation (CASE), J. Chem. Inf. Model., 47, 1053-1066 (2007). 10.1021/ci600528g
16) M. E. Elyashberg, A. J. Williams, and G. E. Martin. Computer-Assisted Structure Verification and Elucidation Tools In NMR-Based Structure Elucidation. Review article. Progress in NMR Spectroscopy (2007) 10.1016/j.pnmrs.2007.04.003
17) Y. D. Smurnyy, K. A. Blinov, T. S. Churanova, M. E. Elyashberg, and A. J. Williams. Toward More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comparison of Neural-Network and Least-Squares Regression Based Approaches, J. Chem. Inf. Model. 48, 128-134, (2008)
18) M. E. Elyashberg, A. J. Williams, D. C. Lankin, G. E. Martin, J. Porco, W. F. Reynolds, and C. Singleton, Applying Computer-Assisted Structure Elucidation Algorithms for the Purpose of Structure Validation – Revising the NMR Assignments of Hexacyclinol, J. Nat. Prod., 71, 581-588 (2008).
19) M.E. Elyashberg, K.A. Blinov and A.J. Williams, A Systematic Approach for the Generation and Verification of Structural Hypotheses. Magn. Reson. Chem. 47, 371-389, (2009)
20) M. E. Elyashberg, A. J. Williams, and K.A. Blinov, The Application of Empirical Methods of 13C NMR Chemical Shift Prediction as a Filter for Determining Possible Relative Stereochemistry. Magn. Reson. Chem. 47, 333-341 (2009)
21) Y. D. Smurnyy, K. A. Blinov, T. S. Churanova, M. E. Elyashberg, and A. J. Williams. Development of a fast and accurate method of 13C NMR chemical shift prediction. Chemometrics and Intelligent Laboratory Systems, 97(1), 91-97, (2009)
22) M. E. Elyashberg, A. J. Williams and K. A. Blinov, Structural revisions of natural products by Computer Assisted Structure Elucidation (CASE) Systems, Nat. Prod. Rep., 2010, DOI: 10.1039/c002332a
23) Blind trials of computer-assisted structure elucidation software, Journal of cheminformatics 4 (1), 5, A Moser, ME Elyashberg, AJ Williams, KA Blinov, JC DiMartino
24) Elucidating ‘undecipherable’chemical structures using computer‐assisted structure elucidation approaches, Mikhail Elyashberg, Kirill Blinov, Sergey Molodtsov, Antony Williams, Magnetic Resonance in Chemistry, 50(1), 22–27, 2012 DOI: 10.1002/mrc.2849
BOOK: Contemporary Computer Assisted Approaches to Molecular Structure Elucidation by Kirill Blinov, Mikhail Elyashberg and Antony J. Williams, Royal Society of Chemistry
This presentation was just given at the ACS meeting in San Diego…
The Royal Society of Chemistry hosts an online resource, ChemSpider, as a structure centric database for chemists linking over 25 million chemicals to 400 internet sites. As a crowdsourced environment members of the chemistry community can deposit spectral data to the database. Almost 2000 NMR spectra have been submitted to the database and these are the basis of both a gaming environment for learning NMR spectroscopy, the SpectralGame, as well as a new teaching environment known as SpectraSchool. This presentation will provide an overview of these two online resources and how they may be utilized for the purpose of teaching NMR spectroscopy in an Undergraduate Curriculum.
Almost two years of work, a collaboration and friendship developed over many years of my tenure at Advanced Chemistry Development (with Mikhail Elyashberg and Kirill Blinov), a story about a decade of work to develop what we believe is the world’s premier Computer Assisted Structure Elucidation software, and multiple iterations later, our book is now at the printers.
Our book is “Contemporary Computer-Assisted Approaches to Molecular Structure Elucidation” and is already listed on Amazon here.
“Computer Assisted Structure Elucidation (CASE) systems are powerful software applications capable of outperforming human data interpretation in terms of both speed and reliability. They combine software algorithms with tools for molecular structure elucidation using spectroscopic data. This book describes the principles on which CASE systems are based and concisely explains the algorithmic concepts behind the programs. It puts the technique in the context of its origins and describes the challenges that have been overcome to produce modern CASE systems. It uses the authors’ software development experience to discuss the present state-of-the-art and explains how the synergistic marrying of man and machine can provide superior results. Readers will gain a firm grounding in the fundamentals of CASE, an understanding of the challenges associated with algorithms, and an appreciation of the technologies underlying NMR prediction and structure verification. Scientists who have never used CASE systems before will find all the information necessary to master this new and very effective approach. Those with some experience will benefit from details on the latest developments.”
I willingly admit I’m glad it’s over…it feels great to have it finished, great to know its at the printers and good to know that we have likely written the definitive volume in this area for the time being. Now time to let my eyes recover before getting back to writing two more volumes about NMR applied to Natural Products, to be released next year all being well!
I recently posted about the project that will become known as NMRCAVES, NMR Computer-Assisted Verification and Elucidation Systems. This will be a workshop to be held at SMASH. There will be no workshop without two essential ingredients: participants and data.
The participants will need to be willing participants to work with us with their software, algorithms and approaches to test their systems on data. The data will be data supplied by the community and provided to the participants in a blind study to test their systems.
To populate the workshop is the first challenge. if we cannot get enough participants then even though we might get an abundance of data there will be no workshop to hold if we cannot engage the groups to work with it. There are a limited number of groups/individuals working in the areas of computer-assisted structure verification and elucidation by NMR. I have listed them below. No offense meant if I have accidentally missed anyone out. Also, they are listed in alphabetical order so no favoritism either…
Mestre Labs MNova
ACD/Labs Structure Elucidator
Can anyone point me to groups or software solutions that I am missing and other potential solutions out in the community that I should approach? I will be approaching the listed groups with an invite to participate in NMRCAVES and then will be asking the community if you are willing to provide data for the project!