I have the pleasure of collaborating with Emma Schymanski and we are literally in daily contact bouncing ideas regarding how to improve the state-of-the-science and informatics for Mass Spectrometry Non-Target Screening. We are both actively out at conferences representing the effort and are iteratively moving things forward (with so many other colleagues we get to work with) so that each presentation reports on the latest developments. Emma presented in Rome this week at the SETAC Europe 28th Annual Meeting and had the chance to show the work that has been going on to integrate the CompTox Chemistry Dashboard and MetFrag. More on that will be reported in detail soon but for now her slides from the meeting are available on SlideShare and embedded here.
Category Archives: MS Structure Identification
My friend and often collaborator gave a talk at Analytica Munich this week (wish I was there) and it was in regards to “Finding small molecules in big data”. I am fortunate to collaborate with Emma on many of the aspects of using cheminformatics approaches to interrogate, interpret and integrate data associated with mass spectrometry analyses and structure identification. It’s been an interesting year working on the challenges together.
Metabolomics and exposomics are amongst the youngest and most dynamic of the omics disciplines. While the molecules involved are smaller than proteomics and the other, larger “omics”, the challenges are in many ways greater. Elements are less constrained, there are no given “puzzle pieces” and there is a resulting explosion in terms of potential chemical space. It is impossible to even enumerate all chemically possible small molecules. The challenges and complexity of identifying small molecules even using the most advanced analytical technologies available today is immense. Current “big data” methods for small molecules rely heavily on chemical databases, the largest of which presently available contain ~100 million chemicals. Despite this large number, high resolution mass spectrometry (HR-MS) measurements contain tens of thousands of features, of which only a few percent can be annotated as “known” and confirmed as metabolites or chemicals of interest using these chemical databases. How can we find relevant small molecules in the ever increasing data loads? How can we annotate more of the unknown features in HR-MS experiments? This talk will present European, US and worldwide initiatives to help find small molecules in big data – from chemical databases to spectral libraries, real-time monitoring to retrospective screening. It will touch on the challenges of standardized structure representations, data curation and deposition. Finally, it will show how interdisciplinary communication, data sharing and pushing the boundaries of current capabilities can facilitate research efforts in metabolomics, exposomics and beyond. This abstract does not necessarily represent U.S. EPA policy.
PRESENTATION ACS Spring 2018: Curating and Sharing Structures and Spectra for the Environmental Community
Curating and sharing structures and spectra for the environmental community
Presented by Emma Schymanski
The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on spectral and chemical compound databases in the environmental community and beyond. Increasingly, new methods are relying on open data resources. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Smaller, selective lists of chemicals (also called “suspect lists”) can be used to perform more efficient annotation. Mass spectral libraries can then be used to increase the confidence in tentative identification. Additional metadata (e.g. exposure and hazard information, reference and data source information) can be extremely useful to prioritize substances of high environmental interest. Exchanging information and “sharing structural linkages” between these resources requires extensive curation to ensure that the correct information is shared correctly, yet many valuable datasets arise from scientists and regulators with little official cheminformatics training. This talk will cover curation efforts undertaken to map spectral libraries (e.g. MassBank.EU, mzCloud) and suspect lists from the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) to unique chemical identifiers associated with the US EPA CompTox Chemistry Dashboard. The curation workflow takes advantage of years of experience, as well as contact with the original data providers, to enable open access to valuable, curated datasets to support environmental scientists and the broader research community (e.g. https://comptox.epa.gov/dashboard/chemical_lists). Note: This abstract does not reflect US EPA policy.
Identifying “known unknowns” via suspect and non-target screening of environmental samples with the in silico fragmenter MetFrag (http://msbi.ipb-halle.de/MetFragBeta/) typically relies on the large compound databases ChemSpider and PubChem (see e.g. Ruttkies et al 2016). The size of these databases (over 50 and 90 million structures, respectively), yield many false positive hits of structures that were never produced in sufficient amounts to be realistically found in the environment (e.g. McEachran et al 2016). One motivation behind the US EPA’s CompTox Chemistry Dashboard is to provide access to compounds of environmental relevance – currently approx. 760,000 chemicals. While the web services are not yet available to incorporate the Dashboard in MetFrag as a database like ChemSpider and PubChem, there are a number of features in MetFragBeta that enables users to use the CompTox Chemistry Dashboard to perform “known unknown” identification with MetFrag. This post highlights the Suspect Screening Functionality.
First we have our (charged) mass. Take m/z = 256.0153. This was measured in positive mode and we assume (correctly) that it’s [M+H]+. Make sure you set this correctly in MetFrag.
Then retrieve your candidates, e.g. using ChemSpider or PubChem and a 5 ppm error margin:
Take the peak list from MassBank here: https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=EA267612&dsn=Eawag and copy into the Fragmentation settings:
You could now process the candidates … but we have not done anything with the Dashboard! This is hidden in the middle in the “Candidate Filter & Score Settings” tab:
You can use the Candidate Filter to process ONLY candidates that are in the CompTox Chemistry Dashboard, excluding all other candidates, by clicking on “Suspect Inclusion Lists” and selecting the “DSSTox” box (see screenshot), which retains (currently) 11 of the 156 ChemSpider candidates:
Once finished the processing, the plot in the “Statistics” tab should look something like this – depending on what additional scores you selected:
It is also possible to use one (or more!) suspect lists to SCORE the different candidates without excluding any matches from ChemSpider or PubChem, by selecting the same box under the “MetFrag Scoring Terms” part instead (see screenshot). Additional lists like the Swiss Pharma list shown below can be downloaded from the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) and also viewed under the lists tab in the CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard/chemical_lists). MetFrag only needs a text file containing InChIKeys of the substances for the upload – which can be obtained from the Dashboard or Suspect Exchange downloads.
Using the Suspect Lists as a “Scoring term”, along with some other criteria and restrictions, will give you a results plot looking more like this:
Curious to find out more? MetFrag comes with a built-in example and you can try this exact example yourself by visiting http://msbi.ipb-halle.de/MetFragBeta/ and using the peak list copied from the bottom of the spectrum available at https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=EA267612&dsn=Eawag
There are many more features to discover: try the website, read the paper (Ruttkies et al 2016) and if you have any questions, please comment below!
Author: Emma Schymanski, 21/11/2017
I am happy to announce the publishing of an article regarding “Open Science for Identifying “Known Unknown” Chemicals” at http://dx.doi.org/10.1021/acs.est.7b01908. I have been involved with two other articles about the identification of “Known Unknowns”.
The first one was a ChemSpider article: “”Identification of “known unknowns” utilizing accurate mass data and ChemSpider”. Journal of The American Society for Mass Spectrometry. 23: 179–185. doi:10.1007/s13361-011-0265-y.”
The second one was a recent article from the EPA: “”Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard”. Analytical and Bioanalytical Chemistry. 409: 1729–1735. doi:10.1007/s00216-016-0139-z.”
The most recent publication was a collaboration with Emma Schymanski from Eawag and it was a real pleasure to write this together. If you are interested in how Open Science can contribute to the challenges associated with the identification of known unknowns check out our latest publication!
It’s almost ten years, this April, since ChemSpider was released to the public at the 233rd ACS meeting in Chicago. For two years, prior to being acquired by RSC in May 2009, we worked very closely with a number of mass spectrometry vendors including Waters (Micromass), Thermo and Agilent. I always considered that the work that we did with ChemSpider could be highly valued by the mass spectrometry community. This was especially true after we published the work for the identification of known unknowns with James Little (http://link.springer.com/article/10.1007/s13361-011-0265-y) Certainly ChemSpider has become highly recognized, and used, by an increasing number of mass spectrometry vendors (through the ChemSpider Web Services).
A few months ago Andrew McEachran joined our team as a postdoc. Combining my experience with bringing ChemSpider to bear for the purpose of structure identification, his mass spectrometry skills and experience, and our tremendous development team to the development of the CompTox Chemistry Dashboard, we were able to make some further advances in the “identification known unknowns”. Our efforts were recently reported in this publication “Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard” (http://link.springer.com/article/10.1007%2Fs00216-016-0139-z). Readers are pointed to the summary tables in the article (results) demonstrating the improved performance of the CompTox Chemistry Dashboard based on high quality data sources and new approaches to rank ordering results based on formula and mass searching.
We recently rolled out new functionality and “MS-Ready structure batch-based searching” to offer even greater support for MS-structure identification . We will report on further extensions to this work at the Spring ACS Meeting.
The AltMetrics for the Article are shown below