RSS

Category Archives: Nuclear magnetic resonance

#NMRCAVES is NMR Computer Assisted Verification and Elucidation Systems

I am honored to have been invited to lead a workshop at the SMASH NMR conference later this year. I will be co-hosting with Michael Bernstein, someone who I have known for many years and with whom I have spent many hours (if not days!) discussing the ins and outs of NMR prediction and structure verification by NMR,

The workshop will provide an environment for developers of software packages and associated algorithms allowing for structure verification and elucidation to engage with interested members of the NMR community attending the SMASH NMR meeting. Presenters may include both commercial and non-commercial software packages and the workshop will allow the participants to report on their respective approaches as well as report on the performance of their algorithms against a large set of data provided by the community.

The one day workshop will be separated into Structure Verification and Structure Elucidation segments with participants who have chosen to participate in the project. We are hoping for participants from both the academic and commercial sectors.

I’ve called the workshop NMRCAVES: NMR Computer Assisted Verification and Elucidation Systems. Below is an outline to initiate a conversation with interested parties. It is a suggested outline for the project and I welcome feedback.

The data analysis components of the workshop are outlined below.
CASV: Four sets of data will be made available to the participants.
(1)    HNMR only, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(2)    HNMR and 2D HSQC, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(3)    HNMR only, minimum of 25 spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)
(4)    HNMR and 2D HSQC (preferably multiplicity edited-HSQC) minimum of 25 sets of spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)

The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). The participants will have the responsibility to provide a report identifying the correct/incorrect structures in test sets (1) and (2) and identifying the correct structure out of the combination of 3 provided in (3) and (4). When all reports have been submitted each participant will receive a report identifying the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

CASE: The objective should be to test the ability of algorithms to correctly elucidate the skeletons of unknowns with the provision of “high-quality datasets” where sensitivity is deemed not to be a limitation.  While it is acknowledged that sensitivity is an issue in CASE approaches this particular hurdle should be removed from the challenging of the algorithms. Request data from a series of laboratories. The minimum dataset should include “High-resolution MS”, 1H, COSY, HSQC/HMBC. Additional data can include TOCSY, DEPT-HSQC, HSQC-TOCSY, 1H-N15 direct and long-range correlation, NOESY/ROESY.
The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). All elucidations will be done blind and the participants will have the responsibility to provide a report including a table of the top 3 structures for each dataset, rank-ordered if possible, from most-likely to least-likely. When all reports have been submitted each participant will receive a report containing the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

Outcome of Project
1)    A review of the state of contemporary computer-based structure verification and elucidation
2)    All data to be publicly shared and made available as Open Data for download and to become a gold standard reference set of data for the community to utilize for further testing and development
3)    All processed spectra to be uploaded and available on a public domain database (e.g. ChemSpider) and associated with the correct chemical structure
4)    A minimum of one co-authored publication reviewing the results of the workshop and associated studies

Your feedback, comments and questions are welcomed. We are especially looking for laboratories who are willing to provide sets of data for analysis during the project as well as software groups who develop algorithms for structure verification and elucidation and who wish to participate in the project.

 

How are NMR Prediction Algorithms and AFM Related?

There’s a really nice News piece over on Nature News regarding “Feeling the Shapes of Molecules“. The work reports on how Atomic Force Microscopy is being used to deduce chemical structure directly, one molecule at a time. It is, quite simply, stunning. This work is an extension of the original work reported on pentacene that many scientists thought was spectacular. This work is even one step closer to the dream of single molecule structure identification. The work is entitled “Organic structure determination using atomic-resolution scanning probe microscopy” and as well as the IBM group responsible for the AFM work involves Marcel Jaspars, someone who’s work I have watched for many years as I am trained as an NMR spectroscopist and have spent a lot of time working on computer-assisted structure elucidation (CASE) approaches to examine natural product structures (see references in here…).

The molecule that they studied was cephalandole A  that had previously been mis-assigned. Interestingly my old colleagues from ACD/Labs, where I worked for over a decade, and myself had published an article in RSC’s Natural Product Reviews where we studied “Structural revisions of natural products by Computer-Assisted Structure Elucidation (CASE) systems“. The basic premise of the article is that there are incorrect structures making it into the literature because of the misinterpretation of the analytical data and that computer algorithms, specifically NMR prediction and CASE algorithms, can be used to rule out structures elucidated by the scientists.It is hard to do justice to the entire review article as we detail the approaches to CASE and NMR prediction and doing it in a blog post is tough. So, I do recommend reading the NPR article. However, I am extracting the part that applies to the elucidation of the structure of cephalandole A and how algorithms would be of value in negating the incorrect structure.

“In 2006 Wu et al isolated a new series of alkaloids, particularly cephalandole A, 16. Using 2D NMR data (not tabulated in the article) they performed a full 13C NMR chemical shift assignment as shown on structure 16.

Mason et al synthesized compound 16 and after inspection of the associated 1H and 13C NMR data concluded that the original structure assigned to cephalandol A was incorrect. The synthetic compound displayed significantly different data from those given by Wu et al. The 13C chemical shifts of the synthetic compound are shown on structure 16A.

Cephalandole A was clearly a closely related structure with the same elemental composition as 16, and structure 17was hypothesized as the most likely candidate. Compound 17 was described in the mid 1960s and this structure was synthesized by Mason et al.The spectral data of the reaction product fully coincided with those reported by Wu et al. The true chemical shift assignment is shown in structure 17. For clarity the differences between the original and revised structures are shown in Figure 17.

We expect that 13C chemical shift prediction, if originally performed for structure 16, would encourage caution by the researchers (we found dA=3.02 ppm).Figure 18 presents the correlation plots of the 13C chemical shift values predicted for structure 16 by both the HOSE and NN methods versus experimental shift values obtained by Wu et al. The large point scattering, the regression equation, the low R2 =0.932 value (an acceptable value is usually R2 ≥ 0.995) and the significant magnitude of the g-angle between the correlation plot and the 45-grade line (a visual indication for disagreement between the experiment and model) could indicate inconsistencies with the proposed structure and should encourage close consideration of the structure.Our experience has demonstrated that a combination of warning attributes can serve to detect questionable structures even in those cases when the StrucEluc system is not used for structure elucidation.

Figure 18. Correlation plots of the 13C chemical shift values predicted for structure 16 by HOSE and NN methods versus experimental shift values obtained by Wu et al. Extracted statistical parameters: R2(HOSE)=0.932, dHOSE=1.20dexp-25.6.

So, for those NMR jocks who don’t have access to the genius of IBM scientists performing AFM, and yet want to have tools to help in the elucidation process you’d be doing well to use NMR prediction algorithms and CASE systems to help….it’s rather embarrassing to have to issue a retraction on a paper with your name on.

Meanwhile I am in awe of the work reported by Marcel and his colleagues at IBM. Clearly there’s a long way to go before such approaches are mainstream but the flag is in the sand…this is where things will speed up and we are surely destined, I hope (!) to see many more reports of this type of work and how it is progressing. Let’s hope. Feedback on the NPR article welcomed!!!

Organic structure determination using atomic-resolution scanning probe microscopy

 

Tags: , , , , , ,

Good Science Takes Time: 16 months to examine NMR Prediction Performance

In October 2007 I got involved in an exchange with Peter Murray-Rust from Cambridge University about Open Notebook NMR. The original post is here and my response is here. The basic premise of the exchange was that I believed that quantum-mechanical NMR predictions had a lot of limitations relative to empirical predictions. I made the comment based on over two decades working in NMR – the first decade managing a number of NMR laboratories and the second decade involved in the delivery of commercial software solutions, including NMR predictions, to the marketplace.

In my original response I stated “This has the potential to be a very exciting project. While I wouldn’t write the paper myself without doing the work I’ll certainly try the approach. Let’s see what the truth is. The challenge now is to get to agreement on how to compare the performance of the algorithms. We are comparing very different beasts with the QM vs. non-QM approaches so, in many ways, this should be much easier than the challenges discussed so far around comparing non-QM approaches between vendors.” and asked Peter to participate in a collaboration with us to do the comparison.

I then posted the blogpost below. It is included in its entirety as it defines what my thought process was almost two years ago and the approach that could be taken. In the blogpost I address a post directly to Peter. If you know the story then go past the history to the conclusions where I discuss the conclusion of the work we have done since this discussion started.

“Previously I blogged about “An Invitation to Collaborate on Open Notebook Science for an NMR Study“. I judged it was a great opportunity to “help build a bridge between the Open Data community, the academic community and the commercial software community for the benefit of science.” In particular I believe the project offers an opportunity to answer a longstanding question I have had. Specifically, I have seen a lot of publications in recent years utilizing complex, time-consuming GIAO NMR predictions. Having been involved with the development of NMR prediction algorithms for the past few years (while working with the scientists at ACD/Labs) my judgment is that these complex calculations can be replaced by calculations which can take just a couple of seconds on a standard PC. I believe this to be true for most organic molecules. I do not believe such calculations would outperform GIAO predictions for inorganic molecules or organometallic complexes or solid state shift tensors. However, there has never been a rigorous examination comparing performance differences. I believe this project offered an excellent opportunity to validate the hypothesis that HOSE code/Neural Network/Increment based predictions could, in general, outperform GIAO predictions.

The study was to be performed on the NMRShiftDB now available on ChemSpider. I’ve blogged previously about the validation of the database (1,2). The conversation about the NMR project has continued and Peter has talked about some of the challenges about open Notebook Science based on Cameron Neylon’s comments. I’ve posted the comments below to the post and they will likely be moderated in shortly. I post them here for the purpose of conclusion since I don’t think my original hopes will come to fruition. Thanks to those of you who have been engaged both on and off blog. I suggest we all help with Peter’s intention to help explain identifiers that are being extracted in the work.

“Can you provide some more details regarding your concerns here:”it would be possible for someone to replicate the whole work in a day and submit it for publication (on the same day) and ostensibly legitimately claim that they had done this independently. They might, of course use a slightly different data set, and slightly different tweaks.”

I have two interpretations:

1) Someone could repeat the GIAO calculations in a day and identify outliers and submit for publication

2) Someone could do the calculations using other algorithms and identify outliers etc and submit for publication

Maybe you mean something else?

For 1) the GIAO calculations CANNOT be repeated since no one has access to Henry‘s algorithms and based on your comments he is modifying them on an ongoing basis as a result of this work. Even if they did have their own GIAO calculations unless they have improved the performance dramatically or have access to a “boat load” of computers the calculations will take weeks (based on your own estimates). That said, comparing one GIAO algorithm to another is valid science and absolutely appropriate and publishable. Also, if they had used used the same dataset as you, with an other algorithm to check prediction and identify outliers it WOULD be independent. Related to the work you are doing for sure but independent.

For 2)using other algorithms on the same dataset is valid and appropriate science. THis is what people do with logP prediction (or MANY other parameters)..they validate their algorithms on the same dataset many times over. Its one of the most common activities in the QSAR and modeling world in my opinion. And people do use slightly different tweaks…it‘s one of the primary manners to shift the algorithms. Henry‘s doing this right now to deal with halogens according to your earlier post. Wolfgang Robien at University of Vienna, ACD/Labs and others use their own approaches but both at a minimum can use HOSE code and Neural Networks. Same general approaches with tweaks. They give different results…all is appropriate science.

Returning to the comment “it would be possible for someone to replicate the whole work in a day and submit it for publication (on the same day) and ostensibly legitimately claim that they had done this independently.”

Wolfgang Robien has taken the NMRShiftDB dataset and performed an analysis. It‘s posted here. ACD/Labs performed a similar analysis as discussed on Ryan‘s blog here. One of the outputs is this document. This resulted in further exchanges and dialog. The parties have discussed this on the phone and face to face with Ryan talking with Wolfgang recently in Europe at a conference.

This was heated and opinionated for sure. STRONG scientific wills and GREAT scientists defending their approaches and performance. Wolfgang is NOT an enemy for ACD/Labs…he has made some of the greatest contributions to the domain of NMR prediction and, in many ways, has been one to emulate in terms of his approach to quality and innovation to create breakthroughs in performance. He is a worthy colleague and drives improvement by his ongoing search for improvements in his own algorithms. I honor him.

The bottom line is this: approaches for the identification of outliers in NMRShiftDB have been DONE already. It‘s been discussed online for months…just do a search on “Robien NMRshiftDB” on google or “ACD/Labs nmrshiftdb”. There are hundreds of pages. We/I just published on the validation of the NMRShiftDB. I blogged about it and you posted it here. Feedback on outliers have been returned to Christoph and changes made already. SO in many ways you are doing repeat work – just using a different algorithm and identifying new outliers. Neither ACD/Labs nor Wolfgang‘s work was exhaustive. it was very much a first cut but did help edit many records already. NO DOUBT you will find new outliers.

I‘ve gone back to the original post and extract two purposes to the work:

1) To perform Open Notebook Science

2) quote “To show that the philosophy works, that the method works, and that NMRShiftDB has a measurable high-quality.”

1) has already changed and is an appropriate outcome from the work.(http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=743)

2) The method of NMR prediction applied to NMRShiftDB to prove quality..high or not…has been done already. Wolfgang and ACD/labs did it already. I judge you‘ll have similar conclusions…it‘s the same dataset.

Stated here http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=737 is “We shall continue on the project, one of whose purposes is to investigate the hypothesis that QM calculations can be used to evaluate the quality of NMR spectra to a useful level.” It‘s a valid investigation and this is testing whether QM can provide good predictions. This is of course known already from the work done by Rychnovsky on hexacyclinol.

To summarize:

1) Using NMR predictions to identify outliers – already done (Robien and ACD/Labs)

2) Validating that GIAO predictions are useful to validate structures – already done (hexacylinol study)

3) Validating the quality of NMRSHiftDB – already done (Robien, ACD/Labs)

All this brings me down to what I “think” are the intentions or outcomes for the project at this point..but I likely have missed something..

1) Identify more outliers that were not identified by the studies of others

2) Deliver back to Christoph and the NMRShiftDB team a list of outliers/concerns/errors with annotations/metadata in order to improve the Open Data source of NMRShiftDB

3) Allow Nick Day to use a lot of what was learned delivering CrystalEye for a second application around NMR and useful for his thesis (A VERY valid goal..good luck Nick)

4) Show the power of blogging to drive Collaboration via OPen Collaborative NMR

SOme additional project deliverables I think include:

1) make online GIAO NMR predictions available

The project deliverables you are working on are defined here and I believe are consistent: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=742

* create a small subset of NMRShiftDB which has been freed from the main errors we – and hopefull the community – can identify.

* Use this to estimate the precision and variance of our QM-based protocol for calculating shifts.

* refine the protocol in the light of variance which can be scientifically explained.

What I still would like to see, BUT this project belongs to you/Henry/Nick of course and you define what it is, is:

1) to help build a bridge between the Open Data community, the academic community and the commercial software community for the benefit of science.” Wolfgang is in academia, so are you, ACD/Labs is commercial and I‘m independent (but of course am associated with ChemSpider…I am an NMR spectrosopist…it‘s why I‘m interested)

2) To validate the performance of GIAO vs HOSE/NN/Inc by providing the final dataset that you used and statistics of performance for GIAO on that datatset. I‘d like to publish the results jointly, if you would be willing to work with the “dark side”

3) To identify where GIAO can outperform the HOSE/NN/Inc approaches

Wolfgang also has thoughts where he says “What would be great to the scientific community: Do calculations on compounds where sophisticated NMR-techniques either fail or are very difficult to perform – e.g. proton-poor compounds or simply ask for a list of compounds which are really suspicious (either the structure is wrong or the assignment is strange, but the puzzle can’t be solved, because the compound is not available for additional measurements).

I‘ve put a lot of effort into blogging onto this project over the past few days. I‘m about to invest some time in making sure that you get information about outliers so you are not doing repeat work. I judge that my hopes for deeper collaboration will remain unfulfilled so I‘ll give up on asking.

I‘ll do what I can to help from this point forward and keep my own rhetoric off of this blog and restrain it to ChemSpider so as to not distract your readers. I look forward to helping for the benefit of the community.

While I was at ACD/Labs I worked with a number of truly excellent scientists. These people were at the forefront of developing NMR prediction technologies as well as Computer Assisted Structure Elucidation (CASE) software. Over the past year and a half I have had the privilege of continuing some of the work I was involved with while at ACD/Labs and our publication regarding “Empirical and DFT GIAO quantum-mechanical methods of 13C chemical shifts prediction: competitors or collaborators?” was released recently. The abstract states:

“The accuracy of 13C chemical shift prediction by both DFT-GIAO quantum-mechanical (QM) and empirical methods was compared using 205 structures for which experimental and QM-calculated chemical shifts were published in the literature. For these structures, 13C chemical shifts were calculated using HOSE code and neural network (NN) algorithms developed within our laboratory. In total, 2531 chemical shifts were analyzed and statistically processed. It has been shown that, in general, QM methods are capable of providing similar but inferior accuracy to the empirical approaches, but quite frequently they give larger mean average error values. For the structural set examined in thiswork, the following mean absolute errors (MAEs) were found: MAE(HOSE) = 1.58 ppm, MAE(NN) = 1.91 ppm and MAE(QM) = 3.29 ppm. A strategy of combined application of both the empirical and DFT GIAO approaches is suggested. The strategy could provide a synergistic effect if the advantages intrinsic to each method are exploited.”

The conclusion includes the following statements “It has been shown that, in general, QM methods are capable of providing similar but inferior accuracy to the empirical approaches, but quite frequently they
give larger mean average error values. This is accounted for mainly with difficulties in selecting the appropriate calculation protocols and difficulties arising from molecular flexibility. The data show that the average accuracy of the QM methods is 1.5–2 times lower than the accuracy shown by the empirical methods. For the structural set examined in this work, the following MAEs were found: MAE(HOSE) = 1.58 ppm, MAE(NN) = 1.91 ppm, MAE(QM) = 3.29 ppm.”

In order to demonstrate that empirical approaches perform QM methods in general we examined 2531 chemical shifts associated with 205 molecules. It was a rather complete study! It took a long time to do the work but it wasn’t done as Open Notebook NMR. It’s published in Magnetic Resonance in Chemistry here: DOI:/10.1002/mrc.2571. Enjoy!

 

Tags: , , , , ,

 
Stop SOPA