RSS

#NMRCAVES is NMR Computer Assisted Verification and Elucidation Systems

18 Jan

I am honored to have been invited to lead a workshop at the SMASH NMR conference later this year. I will be co-hosting with Michael Bernstein, someone who I have known for many years and with whom I have spent many hours (if not days!) discussing the ins and outs of NMR prediction and structure verification by NMR,

The workshop will provide an environment for developers of software packages and associated algorithms allowing for structure verification and elucidation to engage with interested members of the NMR community attending the SMASH NMR meeting. Presenters may include both commercial and non-commercial software packages and the workshop will allow the participants to report on their respective approaches as well as report on the performance of their algorithms against a large set of data provided by the community.

The one day workshop will be separated into Structure Verification and Structure Elucidation segments with participants who have chosen to participate in the project. We are hoping for participants from both the academic and commercial sectors.

I’ve called the workshop NMRCAVES: NMR Computer Assisted Verification and Elucidation Systems. Below is an outline to initiate a conversation with interested parties. It is a suggested outline for the project and I welcome feedback.

The data analysis components of the workshop are outlined below.
CASV: Four sets of data will be made available to the participants.
(1)    HNMR only, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(2)    HNMR and 2D HSQC, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(3)    HNMR only, minimum of 25 spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)
(4)    HNMR and 2D HSQC (preferably multiplicity edited-HSQC) minimum of 25 sets of spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)

The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). The participants will have the responsibility to provide a report identifying the correct/incorrect structures in test sets (1) and (2) and identifying the correct structure out of the combination of 3 provided in (3) and (4). When all reports have been submitted each participant will receive a report identifying the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

CASE: The objective should be to test the ability of algorithms to correctly elucidate the skeletons of unknowns with the provision of “high-quality datasets” where sensitivity is deemed not to be a limitation.  While it is acknowledged that sensitivity is an issue in CASE approaches this particular hurdle should be removed from the challenging of the algorithms. Request data from a series of laboratories. The minimum dataset should include “High-resolution MS”, 1H, COSY, HSQC/HMBC. Additional data can include TOCSY, DEPT-HSQC, HSQC-TOCSY, 1H-N15 direct and long-range correlation, NOESY/ROESY.
The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). All elucidations will be done blind and the participants will have the responsibility to provide a report including a table of the top 3 structures for each dataset, rank-ordered if possible, from most-likely to least-likely. When all reports have been submitted each participant will receive a report containing the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

Outcome of Project
1)    A review of the state of contemporary computer-based structure verification and elucidation
2)    All data to be publicly shared and made available as Open Data for download and to become a gold standard reference set of data for the community to utilize for further testing and development
3)    All processed spectra to be uploaded and available on a public domain database (e.g. ChemSpider) and associated with the correct chemical structure
4)    A minimum of one co-authored publication reviewing the results of the workshop and associated studies

Your feedback, comments and questions are welcomed. We are especially looking for laboratories who are willing to provide sets of data for analysis during the project as well as software groups who develop algorithms for structure verification and elucidation and who wish to participate in the project.

 

2 Responses to #NMRCAVES is NMR Computer Assisted Verification and Elucidation Systems

  1. Stan Sykora

    January 24, 2011 at 6:37 am

    Hi Antony
    I have linked to this article (and named you as well) in an entry about NMRCAVES on Stan’s NMR Blog (Jan 23). As a co-developer of an automatic structure verification system, I am of course quite excited about this. In particular, however, I would like to underline the importance of the data set you will need to create. It would be extremely meritable, I think, if you, Mike, and whomever else could consider not only making that particular set of [data+structures+tasks] permanently available on a public web place, but also take steps for its future maintenance and development. The CAVES task is a huge and tricky one (full of fuzzy logic) and it might take decades before it reaches full maturity. The existence of a benchmark maintained by a neutral authority (meaning one not linked to a particular development group) would no doubt speed it up and help individual developers understand where they stand at any particular moment.
    All the best, Stan

     
  2. tony

    January 24, 2011 at 10:34 pm

    Stan, thanks for the feedback. I believe that the benchmark set can be hosted by the RSC as part of the ChemSpider database. The details regarding “how” this will be hosted can be settled as the project proceeds but RSC is certainly “neutral” as they do not develop NMR prediction or CASE systems. Best wishes and thanks so much for the feedback.

     

Leave a Reply

Your email address will not be published. Required fields are marked *