I have the pleasure of collaborating with Emma Schymanski and we are literally in daily contact bouncing ideas regarding how to improve the state-of-the-science and informatics for Mass Spectrometry Non-Target Screening. We are both actively out at conferences representing the effort and are iteratively moving things forward (with so many other colleagues we get to work with) so that each presentation reports on the latest developments. Emma presented in Rome this week at the SETAC Europe 28th Annual Meeting and had the chance to show the work that has been going on to integrate the CompTox Chemistry Dashboard and MetFrag. More on that will be reported in detail soon but for now her slides from the meeting are available on SlideShare and embedded here.
My friend and often collaborator gave a talk at Analytica Munich this week (wish I was there) and it was in regards to “Finding small molecules in big data”. I am fortunate to collaborate with Emma on many of the aspects of using cheminformatics approaches to interrogate, interpret and integrate data associated with mass spectrometry analyses and structure identification. It’s been an interesting year working on the challenges together.
Metabolomics and exposomics are amongst the youngest and most dynamic of the omics disciplines. While the molecules involved are smaller than proteomics and the other, larger “omics”, the challenges are in many ways greater. Elements are less constrained, there are no given “puzzle pieces” and there is a resulting explosion in terms of potential chemical space. It is impossible to even enumerate all chemically possible small molecules. The challenges and complexity of identifying small molecules even using the most advanced analytical technologies available today is immense. Current “big data” methods for small molecules rely heavily on chemical databases, the largest of which presently available contain ~100 million chemicals. Despite this large number, high resolution mass spectrometry (HR-MS) measurements contain tens of thousands of features, of which only a few percent can be annotated as “known” and confirmed as metabolites or chemicals of interest using these chemical databases. How can we find relevant small molecules in the ever increasing data loads? How can we annotate more of the unknown features in HR-MS experiments? This talk will present European, US and worldwide initiatives to help find small molecules in big data – from chemical databases to spectral libraries, real-time monitoring to retrospective screening. It will touch on the challenges of standardized structure representations, data curation and deposition. Finally, it will show how interdisciplinary communication, data sharing and pushing the boundaries of current capabilities can facilitate research efforts in metabolomics, exposomics and beyond. This abstract does not necessarily represent U.S. EPA policy.
PRESENTATION ACS SPRING 2018: Structure identification by Mass Spectrometry Non-Targeted Analysis using the US EPA’s CompTox Chemistry Dashboard
Structure identification by Mass Spectrometry Non-Targeted Analysis using the US EPA’s CompTox Chemistry Dashboard
Identification of unknowns in mass spectrometry based non-targeted analyses (NTA) requires the integration of complementary pieces of data to arrive at a confident, consensus structure. Researchers use chemical reference databases, spectral matching, fragment prediction tools, retention time prediction tools, and a variety of other data to arrive at tentative, probable, and confirmed, if possible, identifications. With the diverse, robust data contained within the US EPA’s CompTox Chemistry Dashboard (https://comptox.epa.gov), the goal of this research is to identify and implement a harmonized identification tool and workflow using previously generated chemistry data. Data has been compiled from product use, functional use prediction models, environmental media occurrence prediction models, and PubMed references, among other sources. We will report on our development of a visualization tool whereby users can visualize the relative contribution of identification-based metrics on a list of candidate structures and observe the greatest likelihood of occurrence. These data and visualization tools support NTA identification via the Dashboard and demonstrate an open, accessible tool for all users of HRMS data. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
PRESENTATION ACS SPRING 2018: US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for chemical sources of risk
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for chemical sources of risk
Chemical risk assessment is both time-consuming and difficult because it requires the assembly of data for chemicals generally distributed across multiple sources. The US EPA CompTox Chemistry Dashboard is a publicly accessible web-based application providing access to various data streams on ~760,000 chemical substances. These data include experimental and predicted physicochemical property data, bioassay screening data associated with the ToxCast program, consumer product and functional use information and a myriad of related data of value to environmental scientists and toxicologists. At this stage of development, the public dashboard provides access to almost 20 predicted physicochemical and environmental fate and transport endpoints with full transparency in terms of model performance. Experimental and predicted human and ecological toxicity data are also available, as are in vitro to in vivo extrapolation dosimetry predictions and predicted exposure and functional use. In parallel to the CompTox Chemistry Dashboard we are developing RapidTox, a web-based application that enables a rapid, flexible and transparent prioritization process for sets of chemicals using several previously used workflows focused on scoring of traditional risk metrics and the inclusion of alternative hazard and exposure estimates. This presentation will give an overview of the CompTox Chemistry Dashboard, RapidTox, our approaches to building transparent and open prediction models, and our efforts to provide access to real time predictions. This abstract does not necessarily represent U.S. EPA policy.
PRESENTATION ACS SPRING 2018: Accessing information for chemicals in hydraulic fracturing fluids using the US EPA CompTox Chemistry Dashboard
Accessing information for chemicals in hydraulic fracturing fluids using the US EPA CompTox Chemistry Dashboard
EPA’s National Center for Computational Toxicology is developing automated workflows for curating large databases and providing accurate linkages of data to chemical structures, exposure and hazard information. The data are being made available via the EPA’s CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard), a publicly accessible website providing access to data for almost 760,000 chemical substances, the majority of these represented as chemical structures. The web application delivers a wide array of computed and measured physicochemical properties, in vitro high-throughput screening data and in vivo toxicity data as well as integrated chemical linkages to a growing list of literature, toxicology, and analytical chemistry websites. In addition, several specific search types are in development to directly support the mass spectroscopy non-targeted screening community, who are generating important data for detecting and assessing environmental exposures to chemicals contained within DSSTox. The application provides access to segregated lists of chemicals that are of specific interests to relevant stakeholders including, for example, scientists interested in algal toxins and hydraulic fracturing chemicals. This presentation will provide an overview of the challenges associated with the curation of data from EPA’s December 2016 Hydraulic Fracturing Drinking Water Assessment Report that represented chemicals reported to be used in hydraulic fracturing fluids and those found in produced water. The data have been integrated into the dashboard with a number of resulting benefits: a searchable database of chemical properties, with hazard and exposure predictions, and open literature. The application of the dashboard to support mass spectrometry non-targeted analysis studies will also be reviewed. This abstract does not reflect U.S. EPA policy.
PRESENTATION ACS SPRING 2018: Development of a Tool for Systematic Integration of Traditional and New Approach Methods for Prioritizing Chemical Lists
Development of a Tool for Systematic Integration of Traditional and New Approach Methods for Prioritizing Chemical Lists
Multiple regulatory bodies (EPA, ECHA, Health Canada) are currently tasked with prioritizing chemicals for data collection and risk assessments. These prioritization efforts are in response to regulatory mandates to identify chemicals for further assessment. We have developed a web-based application that enables a rapid, flexible and transparent prioritization process. The tool includes multiple data streams related to human and ecological hazard, exposure, and physicochemical properties (persistence and bioaccumulation). For human hazard, the data streams include quantitative points of departure (PODs) that are compiled from multiple sources such as EPA ToxRefDB, ECHA, COSMOS; estimated PODs from high-throughput in vitro screening assays and computational models; and qualitative measurements and predictions of specific endpoints (e.g., genotoxicity, endocrine activity). For ecological hazard, quantitative PODs are taken from the EPA ECOTOX database. Exposure information includes production volume, quantitative predictions using the EPA ExpoCast and SHEDS models, biomonitoring data, and qualitative information such as media occurrence, use profiles and likelihood of consumer and childhood exposures. The use of the tool is illustrated by prioritizing chemicals related to TSCA and the Safer Choice Ingredient List. The underpinning data streams for this application are already available in the EPA CompTox Chemistry Dashboard and have been repurposed to deliver this application. This is in keeping with our overarching software development methodology of providing multiple “building blocks” in the form of databases, web services and visualization components to deliver fit-for purpose applications to the relevant audiences. This abstract does not necessarily represent U.S. EPA policy.
PRESENTATION ACS SPRING 2018: New developments in delivering public access to data from the National Center for Computational Toxicology at the EPA
New developments in delivering public access to data from the National Center for Computational Toxicology at the EPA
Researchers at EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The goal of this research program is to quickly evaluate thousands of chemicals, but at a much reduced cost and shorter time frame relative to traditional approaches. The data generated by the Center includes characterization of thousands of chemicals across hundreds of high-throughput screening assays, consumer use and production information, pharmacokinetic properties, literature data, physical-chemical properties as well as the predictive computational modeling of toxicity and exposure. We have developed a number of databases and applications to deliver the data to the public, academic community, industry stakeholders, and regulators. This presentation will provide an overview of our work to develop an architecture that integrates diverse large-scale data from the chemical and biological domains, our approaches to disseminate these data, and the delivery of models supporting predictive computational toxicology. In particular, this presentation will review our new CompTox Chemistry Dashboard and the developing architecture to support real-time property and toxicity endpoint prediction. This abstract does not reflect U.S. EPA policy.
PRESENTATION ACS SPRING 2018: Overview of open resources to support automated structure verification and elucidation
Overview of open resources to support automated structure verification and elucidation
Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This presentation will provide an overview of freely available data, tools, databases and approaches available to support chemical structure verification and elucidation and highlight some of the known issues regarding data quality and suggest approaches for resolving some of the issues. The importance of structure and spectral standards for data exchange will be discussed, especially with regard to how spectral data can be made openly available to the community via online tools and through scientific publishing. This work does not necessarily reflect U.S. EPA policy.
PRESENTATION ACS SPRING 2018: Sharing chemical structures with peer-reviewed publications. Are we there yet?
Sharing chemical structures with peer-reviewed publications. Are we there yet?
In the domain of chemistry one of the greatest benefits to publishing research is that data can be shared. Unfortunately, the vast majority of chemical structure data associated with scientific publications remain locked up in document form, primarily in PDF files or trapped on webpages. Despite the explosive growth of online chemical databases and the overall maturity of cheminformatics platforms, many barriers stifle the exchange of chemical structures via publications. These challenges include incomplete support by accepted standards (especially InChI) for advanced stereochemistry, organometallic compounds and generic “Markush” representations, the difference between human-readable and computer-readable forms of data, and challenges with the computer representation of chemical structures. To address these obstacles to chemical structure sharing, US EPA National Center for Computational Toxicology scientists are using a combination of cheminformatics applications and online repositories to distribute chemical structure data associated with their publications. This presentation will describe how EPA-NCCT chemical structure data that is amenable to indexing and distribution are shared and highlight the benefit of open data sharing for modeling, data integration, and increasing research impact. This abstract does not reflect U.S. EPA policy.
PRESENTATION ACS SPRING 2018: Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses
Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses
Antony J. Williams, Andrew D. McEachran, Seth Newton, Kristin Isaacs, Katherine Phillips, Nancy Baker, Christopher Grulke and Jon R. Sobus
High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are advancing the identification of emerging contaminants in environmental matrices, improving the means by which exposure analyses can be conducted. However, confidence in structure identification of unknowns in NTA presents challenges to analytical chemists. Structure identification requires integration of complementary data types such as reference databases, fragmentation prediction tools, and retention time prediction models. The goal of this research is to optimize and implement structure identification functionality within the US EPA’s CompTox Chemistry Dashboard, an open chemistry resource and web application containing data for ~760,000 substances. Rank-ordering the number of sources associated with chemical records within the Dashboard (Data Source Ranking) improves the identification of unknowns by bringing the most likely candidate structures to the top of a search results list. Database searching has been further optimized with the generation of MS-Ready Structures. MS-Ready structures are de-salted, stripped of stereochemistry, and mixture separated to replicate the form of a chemical observed via HRMS. Functionality to conduct batch searching of molecular formulae and monoisotopic masses was designed and released to improve searching efforts. Finally, a scoring-based identification scheme was developed, optimized, and surfaced via the Dashboard using multiple data streams contained within the database underlying the Dashboard. The scoring-based identification scheme improved the identification of unknowns over previous efforts using data source ranking alone. Combining these steps within an open chemistry resource provides a freely available software tool for structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.