RSS

PRESENTATION ACS SPRING 2018: Accessing information for chemicals in hydraulic fracturing fluids using the US EPA CompTox Chemistry Dashboard

Accessing information for chemicals in hydraulic fracturing fluids using the US EPA CompTox Chemistry Dashboard

EPA’s National Center for Computational Toxicology is developing automated workflows for curating large databases and providing accurate linkages of data to chemical structures, exposure and hazard information. The data are being made available via the EPA’s CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard), a publicly accessible website providing access to data for almost 760,000 chemical substances, the majority of these represented as chemical structures. The web application delivers a wide array of computed and measured physicochemical properties, in vitro high-throughput screening data and in vivo toxicity data as well as integrated chemical linkages to a growing list of literature, toxicology, and analytical chemistry websites. In addition, several specific search types are in development to directly support the mass spectroscopy non-targeted screening community, who are generating important data for detecting and assessing environmental exposures to chemicals contained within DSSTox. The application provides access to segregated lists of chemicals that are of specific interests to relevant stakeholders including, for example, scientists interested in algal toxins and hydraulic fracturing chemicals. This presentation will provide an overview of the challenges associated with the curation of data from EPA’s December 2016 Hydraulic Fracturing Drinking Water Assessment Report that represented chemicals reported to be used in hydraulic fracturing fluids and those found in produced water. The data have been integrated into the dashboard with a number of resulting benefits: a searchable database of chemical properties, with hazard and exposure predictions, and open literature. The application of the dashboard to support mass spectrometry non-targeted analysis studies will also be reviewed. This abstract does not reflect U.S. EPA policy.

https://doi.org/10.6084/m9.figshare.6027326.v1

 
Leave a comment

Posted by on March 26, 2018 in ACS Meetings

 

PRESENTATION ACS SPRING 2018: Development of a Tool for Systematic Integration of Traditional and New Approach Methods for Prioritizing Chemical Lists

Development of a Tool for Systematic Integration of Traditional and New Approach Methods for Prioritizing Chemical Lists

Multiple regulatory bodies (EPA, ECHA, Health Canada) are currently tasked with prioritizing chemicals for data collection and risk assessments. These prioritization efforts are in response to regulatory mandates to identify chemicals for further assessment. We have developed a web-based application that enables a rapid, flexible and transparent prioritization process. The tool includes multiple data streams related to human and ecological hazard, exposure, and physicochemical properties (persistence and bioaccumulation). For human hazard, the data streams include quantitative points of departure (PODs) that are compiled from multiple sources such as EPA ToxRefDB, ECHA, COSMOS; estimated PODs from high-throughput in vitro screening assays and computational models; and qualitative measurements and predictions of specific endpoints (e.g., genotoxicity, endocrine activity). For ecological hazard, quantitative PODs are taken from the EPA ECOTOX database. Exposure information includes production volume, quantitative predictions using the EPA ExpoCast and SHEDS models, biomonitoring data, and qualitative information such as media occurrence, use profiles and likelihood of consumer and childhood exposures. The use of the tool is illustrated by prioritizing chemicals related to TSCA and the Safer Choice Ingredient List. The underpinning data streams for this application are already available in the EPA CompTox Chemistry Dashboard and have been repurposed to deliver this application. This is in keeping with our overarching software development methodology of providing multiple “building blocks” in the form of databases, web services and visualization components to deliver fit-for purpose applications to the relevant audiences. This abstract does not necessarily represent U.S. EPA policy.

https://doi.org/10.6084/m9.figshare.6027068.v1

 
 

PRESENTATION ACS SPRING 2018: New developments in delivering public access to data from the National Center for Computational Toxicology at the EPA

New developments in delivering public access to data from the National Center for Computational Toxicology at the EPA

Researchers at EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The goal of this research program is to quickly evaluate thousands of chemicals, but at a much reduced cost and shorter time frame relative to traditional approaches. The data generated by the Center includes characterization of thousands of chemicals across hundreds of high-throughput screening assays, consumer use and production information, pharmacokinetic properties, literature data, physical-chemical properties as well as the predictive computational modeling of toxicity and exposure. We have developed a number of databases and applications to deliver the data to the public, academic community, industry stakeholders, and regulators. This presentation will provide an overview of our work to develop an architecture that integrates diverse large-scale data from the chemical and biological domains, our approaches to disseminate these data, and the delivery of models supporting predictive computational toxicology. In particular, this presentation will review our new CompTox Chemistry Dashboard and the developing architecture to support real-time property and toxicity endpoint prediction. This abstract does not reflect U.S. EPA policy.

https://doi.org/10.6084/m9.figshare.6026957.v1

 
 

PRESENTATION ACS SPRING 2018: Overview of open resources to support automated structure verification and elucidation

Overview of open resources to support automated structure verification and elucidation

Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This presentation will provide an overview of freely available data, tools, databases and approaches available to support chemical structure verification and elucidation and highlight some of the known issues regarding data quality and suggest approaches for resolving some of the issues. The importance of structure and spectral standards for data exchange will be discussed, especially with regard to how spectral data can be made openly available to the community via online tools and through scientific publishing. This work does not necessarily reflect U.S. EPA policy.

https://doi.org/10.6084/m9.figshare.6026930.v1

 
 

PRESENTATION ACS SPRING 2018: Sharing chemical structures with peer-reviewed publications. Are we there yet?

Sharing chemical structures with peer-reviewed publications. Are we there yet?

In the domain of chemistry one of the greatest benefits to publishing research is that data can be shared. Unfortunately, the vast majority of chemical structure data associated with scientific publications remain locked up in document form, primarily in PDF files or trapped on webpages. Despite the explosive growth of online chemical databases and the overall maturity of cheminformatics platforms, many barriers stifle the exchange of chemical structures via publications. These challenges include incomplete support by accepted standards (especially InChI) for advanced stereochemistry, organometallic compounds and generic “Markush” representations, the difference between human-readable and computer-readable forms of data, and challenges with the computer representation of chemical structures. To address these obstacles to chemical structure sharing, US EPA National Center for Computational Toxicology scientists are using a combination of cheminformatics applications and online repositories to distribute chemical structure data associated with their publications. This presentation will describe how EPA-NCCT chemical structure data that is amenable to indexing and distribution are shared and highlight the benefit of open data sharing for modeling, data integration, and increasing research impact. This abstract does not reflect U.S. EPA policy.

https://doi.org/10.6084/m9.figshare.6026906.v1

 
 

PRESENTATION ACS SPRING 2018: Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses

Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses

Antony J. Williams, Andrew D. McEachran, Seth Newton, Kristin Isaacs, Katherine Phillips, Nancy Baker, Christopher Grulke and Jon R. Sobus

High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are advancing the identification of emerging contaminants in environmental matrices, improving the means by which exposure analyses can be conducted. However, confidence in structure identification of unknowns in NTA presents challenges to analytical chemists. Structure identification requires integration of complementary data types such as reference databases, fragmentation prediction tools, and retention time prediction models. The goal of this research is to optimize and implement structure identification functionality within the US EPA’s CompTox Chemistry Dashboard, an open chemistry resource and web application containing data for ~760,000 substances. Rank-ordering the number of sources associated with chemical records within the Dashboard (Data Source Ranking) improves the identification of unknowns by bringing the most likely candidate structures to the top of a search results list. Database searching has been further optimized with the generation of MS-Ready Structures. MS-Ready structures are de-salted, stripped of stereochemistry, and mixture separated to replicate the form of a chemical observed via HRMS. Functionality to conduct batch searching of molecular formulae and monoisotopic masses was designed and released to improve searching efforts. Finally, a scoring-based identification scheme was developed, optimized, and surfaced via the Dashboard using multiple data streams contained within the database underlying the Dashboard. The scoring-based identification scheme improved the identification of unknowns over previous efforts using data source ranking alone. Combining these steps within an open chemistry resource provides a freely available software tool for structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

https://doi.org/10.6084/m9.figshare.6026081.v1

 
Leave a comment

Posted by on March 25, 2018 in ACS Meetings

 

PRESENTATION ACS SPRING 2018: Adding Complex Expert Knowledge into Chemical Database and Transforming Surfactants in Wastewater

Adding Complex Expert Knowledge into Chemical Databases: Transforming Surfactants in Wastewater

PRESENTED by Emma Schymanski

The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on chemical compound databases. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Additional data (e.g. fragmentation, physicochemical properties, reference and data source information) is then used to select potential candidates, depending on the experimental context. However, these strategies require the presence of substances of interest in these compound databases, which is often not the case as no database can be fully inclusive. A prominent example with clear data gaps are surfactants, used in many products in our daily lives, yet often absent as discrete structures in compound databases. Linear alkylbenzene sulfonates (LAS) are a common, high use and high priority surfactant class that have highly complex transformation behaviour in wastewater. Despite extensive reports in the environmental literature, few of the LAS and none of the related transformation products were reported in any compound databases during an investigation into Swiss wastewater effluents, despite these forming the most intense signals. The LAS surfactant class will be used to demonstrate how the coupling of environmental observations with high resolution mass spectrometry and detailed literature data (expert knowledge) on the transformation of these species can be used to progressively “fill the gaps” in compound databases. The LAS and their transformation products have been added to the CompTox Chemistry Dashboard (https://comptox.epa.gov/) using a combination of “representative structures” and “related structures” starting from the structural information contained in the literature. By adding this information into a centralized open resource, future environmental investigations can now profit from the expert knowledge previously scattered throughout the literature. Note: This abstract does not reflect US EPA policy.

https://doi.org/10.6084/m9.figshare.6025826.v1

 

 
 

PRESENTATION ACS Spring 2018: Curating “Suspect Lists” for International Non-target Screening Efforts

Curating “Suspect Lists” for International Non-target Screening Efforts

Emma L. Schymanski, Reza Aalizadeh, Nikolaos S. Thomaidis, Juliane Hollender, Jaroslav Slobodnik, Antony J. Williams5

1Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, Luxembourg.
2National and Kapodistrian University of Athens, Department of Chemistry, Panepistimiopolis Zografou, 157 71 Athens, Greece.
3Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland.
4Environmental Institute, Okružná 784/42, 972 41 Koš, Slovak Republic.
5National Center for Computational Toxicology, US EPA, Research Triangle Park, Durham, NC, USA.

PRESENTED by Emma Schymanski

The NORMAN Network (www.norman-network.com) is a unique network of reference laboratories, research centres and related organisations for monitoring of emerging environmental substances, through European and across the world. Key activities of the network include prioritization of emerging substances and non-target screening. A recent collaborative trial revealed that suspect screening (using specific lists of chemicals to find “known unknowns”) was a very common and efficient way to expedite non-target screening (Schymanski et al. 2015, DOI: 10.1007/s00216-015-8681-7). As a result, the NORMAN Suspect Exchange was founded (http://www.norman-network.com/?q=node/236) and members were encouraged to submit their suspect lists. To date 20 lists of highly varying substance numbers (between 52 and 30,418), quality and information content have been uploaded, including valuable information previously unavailable to the public. All preparation and curation was done within the network using open access cheminformatics toolkits. Additionally, members expressed a desire for one merged list (“SusDat”). However, as a small network with very limited resources (member contributions only), the burden of curating and merging these lists into a high quality, curated dataset went beyond the capacity and expertise of the network. In 2017 the NORMAN Suspect Exchange and US EPA CompTox Chemistry Dashboard (https://comptox.epa.gov/) pooled resources in curating and uploading these lists to the Dashboard (https://comptox.epa.gov/dashboard/chemical_lists). This talk will cover the curation and annotation of the lists with unique identifiers (known as DTXSIDs), plus the advantages and drawbacks of these for NORMAN (e.g. creating a registration/resource inter-dependence). It will cover the use of “MS-ready structure forms” with chemical substances provided in the form observed by the mass spectrometer (e.g. desalted, as separate components of mixtures) and how these efforts will support other NORMAN activities. Finally, limitations of existing cheminformatics approaches and future ideas for extending this work will be covered. Note: This abstract does not reflect US EPA policy.

https://doi.org/10.6084/m9.figshare.6025799.v1

 

 
 

PRESENTATION ACS Spring 2018: Curating and Sharing Structures and Spectra for the Environmental Community

Curating and sharing structures and spectra for the environmental community

Presented by Emma Schymanski

The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on spectral and chemical compound databases in the environmental community and beyond. Increasingly, new methods are relying on open data resources. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Smaller, selective lists of chemicals (also called “suspect lists”) can be used to perform more efficient annotation. Mass spectral libraries can then be used to increase the confidence in tentative identification. Additional metadata (e.g. exposure and hazard information, reference and data source information) can be extremely useful to prioritize substances of high environmental interest. Exchanging information and “sharing structural linkages” between these resources requires extensive curation to ensure that the correct information is shared correctly, yet many valuable datasets arise from scientists and regulators with little official cheminformatics training. This talk will cover curation efforts undertaken to map spectral libraries (e.g. MassBank.EU, mzCloud) and suspect lists from the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) to unique chemical identifiers associated with the US EPA CompTox Chemistry Dashboard. The curation workflow takes advantage of years of experience, as well as contact with the original data providers, to enable open access to valuable, curated datasets to support environmental scientists and the broader research community (e.g. https://comptox.epa.gov/dashboard/chemical_lists).  Note: This abstract does not reflect US EPA policy.

https://doi.org/10.6084/m9.figshare.6025778.v1

 

 

PRESENTATION ACS Spring 2018: Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls

Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls 

Presented by Emma Schymanski

The European MassBank server (www.massbank.eu) was founded in 2012 by the NORMAN Network (www.norman-network.net) to provide open access to mass spectra of substances of environmental interest contributed by NORMAN members. The automated workflow RMassBank was developed as a part of this effort (https://github.com/MassBank/RMassBank/). This workflow included automated processing of the mass spectral data, as well as automated annotation using the SMILES, Names and CAS numbers provided by the user. Cheminformatics toolkits (e.g. Open Babel, rcdk) and web services (e.g. the CACTUS Chemical Identifier Resolver, Chemical Translation Services (CTS), ChemSpider, PubChem) were then used to convert and/or retrieve the remaining information for completion of the MassBank records (additional names, InChIs, InChIKeys, several database identifiers, mol files), to avoid excessive burden on the users and reduce the chance of errors. To date, approximately 16,000 MS/MS spectra (61 % of all open data as of Nov. 2016) corresponding with 1,269 (18 %) unique chemicals have been uploaded to MassBank.EU via RMassBank. Curating the MassBank.EU records, as part of efforts to provide EPA CompTox Dashboard identifiers (DTXSIDs) for each record, revealed several conflicts in the chemical metadata arising from varying sources. In addition, the representation of “ambiguous substances”, for example complex surfactant mixtures of various chain lengths and branching or incompletely-defined structures of transformaton products, is an ongoing challenge. In this work, we report on proof-of-concept solutions for “ambiguous structure” representation, currently unavailable in the majority of cheminformatics tools. This presentation reflects on the effectiveness of the original RMassBank concept but also identifies pitfalls that automated structure annotation with open resources offers to streamline spectra contributions from external laboratories and users with widely ranging cheminformatics experience. Note: this work does not necessarily reflect U.S. EPA policy.

https://doi.org/10.6084/m9.figshare.6025769.v1

 

 
 
Stop SOPA