Category Archives: ACS Meetings

PRESENTATION ACS Spring 2018: Curating “Suspect Lists” for International Non-target Screening Efforts

Curating “Suspect Lists” for International Non-target Screening Efforts

Emma L. Schymanski, Reza Aalizadeh, Nikolaos S. Thomaidis, Juliane Hollender, Jaroslav Slobodnik, Antony J. Williams5

1Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, Luxembourg.
2National and Kapodistrian University of Athens, Department of Chemistry, Panepistimiopolis Zografou, 157 71 Athens, Greece.
3Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland.
4Environmental Institute, Okružná 784/42, 972 41 Koš, Slovak Republic.
5National Center for Computational Toxicology, US EPA, Research Triangle Park, Durham, NC, USA.

PRESENTED by Emma Schymanski

The NORMAN Network ( is a unique network of reference laboratories, research centres and related organisations for monitoring of emerging environmental substances, through European and across the world. Key activities of the network include prioritization of emerging substances and non-target screening. A recent collaborative trial revealed that suspect screening (using specific lists of chemicals to find “known unknowns”) was a very common and efficient way to expedite non-target screening (Schymanski et al. 2015, DOI: 10.1007/s00216-015-8681-7). As a result, the NORMAN Suspect Exchange was founded ( and members were encouraged to submit their suspect lists. To date 20 lists of highly varying substance numbers (between 52 and 30,418), quality and information content have been uploaded, including valuable information previously unavailable to the public. All preparation and curation was done within the network using open access cheminformatics toolkits. Additionally, members expressed a desire for one merged list (“SusDat”). However, as a small network with very limited resources (member contributions only), the burden of curating and merging these lists into a high quality, curated dataset went beyond the capacity and expertise of the network. In 2017 the NORMAN Suspect Exchange and US EPA CompTox Chemistry Dashboard ( pooled resources in curating and uploading these lists to the Dashboard ( This talk will cover the curation and annotation of the lists with unique identifiers (known as DTXSIDs), plus the advantages and drawbacks of these for NORMAN (e.g. creating a registration/resource inter-dependence). It will cover the use of “MS-ready structure forms” with chemical substances provided in the form observed by the mass spectrometer (e.g. desalted, as separate components of mixtures) and how these efforts will support other NORMAN activities. Finally, limitations of existing cheminformatics approaches and future ideas for extending this work will be covered. Note: This abstract does not reflect US EPA policy.


Leave a comment

Posted by on March 25, 2018 in ACS Meetings


PRESENTATION ACS Spring 2018: Curating and Sharing Structures and Spectra for the Environmental Community

Curating and sharing structures and spectra for the environmental community

Presented by Emma Schymanski

The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on spectral and chemical compound databases in the environmental community and beyond. Increasingly, new methods are relying on open data resources. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Smaller, selective lists of chemicals (also called “suspect lists”) can be used to perform more efficient annotation. Mass spectral libraries can then be used to increase the confidence in tentative identification. Additional metadata (e.g. exposure and hazard information, reference and data source information) can be extremely useful to prioritize substances of high environmental interest. Exchanging information and “sharing structural linkages” between these resources requires extensive curation to ensure that the correct information is shared correctly, yet many valuable datasets arise from scientists and regulators with little official cheminformatics training. This talk will cover curation efforts undertaken to map spectral libraries (e.g. MassBank.EU, mzCloud) and suspect lists from the NORMAN Suspect Exchange ( to unique chemical identifiers associated with the US EPA CompTox Chemistry Dashboard. The curation workflow takes advantage of years of experience, as well as contact with the original data providers, to enable open access to valuable, curated datasets to support environmental scientists and the broader research community (e.g.  Note: This abstract does not reflect US EPA policy.



PRESENTATION ACS Spring 2018: Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls

Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls 

Presented by Emma Schymanski

The European MassBank server ( was founded in 2012 by the NORMAN Network ( to provide open access to mass spectra of substances of environmental interest contributed by NORMAN members. The automated workflow RMassBank was developed as a part of this effort ( This workflow included automated processing of the mass spectral data, as well as automated annotation using the SMILES, Names and CAS numbers provided by the user. Cheminformatics toolkits (e.g. Open Babel, rcdk) and web services (e.g. the CACTUS Chemical Identifier Resolver, Chemical Translation Services (CTS), ChemSpider, PubChem) were then used to convert and/or retrieve the remaining information for completion of the MassBank records (additional names, InChIs, InChIKeys, several database identifiers, mol files), to avoid excessive burden on the users and reduce the chance of errors. To date, approximately 16,000 MS/MS spectra (61 % of all open data as of Nov. 2016) corresponding with 1,269 (18 %) unique chemicals have been uploaded to MassBank.EU via RMassBank. Curating the MassBank.EU records, as part of efforts to provide EPA CompTox Dashboard identifiers (DTXSIDs) for each record, revealed several conflicts in the chemical metadata arising from varying sources. In addition, the representation of “ambiguous substances”, for example complex surfactant mixtures of various chain lengths and branching or incompletely-defined structures of transformaton products, is an ongoing challenge. In this work, we report on proof-of-concept solutions for “ambiguous structure” representation, currently unavailable in the majority of cheminformatics tools. This presentation reflects on the effectiveness of the original RMassBank concept but also identifies pitfalls that automated structure annotation with open resources offers to streamline spectra contributions from external laboratories and users with widely ranging cheminformatics experience. Note: this work does not necessarily reflect U.S. EPA policy.



Spring ACS Meeting San Francisco, April 2017

The Spring ACS Meeting is coming, and it’s coming quickly. Every time the New Year starts I think I have a long time before I have to assemble posters and write talks for the ACS Meeting. When I worked at the RSC it was easier in some ways as NO ONE reviewed them, no one gave comments on them and there was no clearance process involved. Mostly I was writing the talks on the flight out to the ACS or, more commonly, was writing them the evening before or morning of the presentations. There have been days when I got up in the morning at 4am to write two talks on the day I presented. Quite exhausting but at least I got to show the latest and greatest capabilities.

As an employee at the EPA there are different expectations especially in regards to the clearance process where the presentations are reviewed and signed off, pushed through our internal repository and, post-presentation, released to the community via Science Inventory. Some, not all, of the presentations and papers I have been involved with since joining EPA, are here.

I will be going to the ACS meeting with a number of colleagues and chairing a session on Thursday, all day, with Chris Grulke for the Division of Environmental Chemistry. I will be presenting a number of posters and presentations as listed below. A number of my colleagues will also be presenting. Andrew McEachran, a recent postdoc with the center will be presenting on a lot of the work that has been done in terms of the use of the Chemistry Dashboard to facilitate structure identification. The recent publication “Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard” ( reported on a comparison of the dashboard versus ChemSpider. Since then we have rolled out a lot of new functionality to support structure identification and Andrew will report on that.

PAPER ID: 2624963
PAPER TITLE: Twenty five years in cheminformatics: A career path through a diverse series of roles and responsibilities

DIVISION: Division of Chemical Information
SESSION: Careers in Chemical Information
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – AM

PAPER ID: 2616719
PAPER TITLE: Evaluating suspect screening and non-targeted analysis approaches using a collaborative research trial at the US EPA

DIVISION: Division of Analytical Chemistry
SESSION: Analytical Division Poster Session
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – EVE

PAPER ID: 2624980
PAPER TITLE: EPA CompTox chemistry dashboard: An online resource for environmental chemists

DIVISION: Division of Chemical Health and Safety
SESSION: Information Flow in Environmental Health & Safety
DAY & HALF DAY OF PRESENTATION: Tuesday, April, 04, 2017 – PM
PAPER ID: 2624984
PAPER TITLE: Delivering an informational hub for data at the National Center for Computational Toxicology

DIVISION: Division of Environmental Chemistry
SESSION: Applications of Cheminformatics & Computational Chemistry in Environmental Health
DAY & HALF DAY OF PRESENTATION: Wednesday, April, 05, 2017 – EVE

Looking forward to seeing you at ACS!



PRESENTATION: Building an Online Profile Using Social Networking and Amplification Tools for Scientists

This presentation was given as a 2 hour hands-on training course at the Frontier Building in the Research Triangle Park in NC funded by an Industry Award Grant from the ACS and matching financial support from the Research Triangle Institute.

Abstract “Many of us nowadays invest significant amounts of time in sharing our activities and opinions with friends and family via social networking tools such as Facebook, Twitter or other related websites. However, despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our careers. We are already being indexed and exposed on the internet via our publications, presentations and data and new “AltMetric scores” are being assigned to scientific publications as measures of popularity and, supposedly, of impact. We now have even more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation provides an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. It will discuss the new world of AltMetrics that is in an explosive growth curve and will help you understand how to influence and leverage some of these new measures. Participating online, whether it be simply for career advancement or for wider exposure of your research, there are now a series of web applications that can provide a great opportunity to develop a scientific profile within the community.”



Programmatic conversion of crystal structures into 3D printable files using Jmol

A new paper that came out of a collaboration initiated at an ACS Meeting, maybe three years ago, has finally gone online. My recollection is that at an ACS CINF reception I started chatting with Vincent Scalfani. At that time I was involved with ChemSpider and he bounced an idea about 3D printing of crystal structures. I reported that we were going to host the Crystal Structures on ChemSpider (here) and Vincent even presented on it at the ACS (here, with >2000 views). But as happened on a fairly regular basis a great idea never came to fruition and the data were not put onto ChemSpider, and I left to join the EPA over eighteen months ago.

But it was still great work, and when it was made clear that the data would not see light of day the original article, written 2 years ago give or take, was adjusted to simply communicate that the data were available on Figshare here ( The peer review process gave good feedback and pretty much said “Why aren’t they on a searchable database”? Well, we tried, but Bob Hanson, JMol-hero, got to work and produced this site in a few days! Bob is incredibly productive.

Well then the paper was accepted, all is good, the data are open and the world has access to tens of thousands of crystal structures ready for printing.

The paper is available here: “Programmatic conversion of crystal structures into 3D printable files using Jmol” at

Leave a comment

Posted by on November 25, 2016 in ACS Meetings


The EPA iCSS Chemistry Dashboard to Support Compound Identification Using High Resolution Mass Spectrometry Data

Presentation given at ACS Meeting in Philadelphia in August 2016

The EPA iCSS Chemistry Dashboard to Support Compound Identification Using High Resolution Mass Spectrometry Data

There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.

THis work is relevant to the article: “Linking high resolution mass spectrometry data with exposure and toxicity forecasts to advance high-throughput environmental monitoring” DOI:

My NC-ACS Distinguished Speaker Award Presentation

Last night I was honored to receive an award from the North Carolina Local Section of the American Chemical Society. I had the chance to review the past 20 years of my career with the attendees. I assembled a slide deck from about ten years of slides stored on Slideshare (I am glad I have been storing them there as it’s a great online storage place!). I appreciate the recognition from the Local Division. THANKS!


Call For Papers: Applications of Cheminformatics and Computational Chemistry in Environmental Health


Applications of Cheminformatics and Computational Chemistry in

Environmental Health

 253rd American Chemical Society National Meeting & Exposition

“Advanced Materials, Technologies, Systems & Processes”

San Francisco, California, April 2-6, 2017

Abstract Deadline: October 2016


Cheminformatics and computational chemistry have had an enormous impact in regards to providing environmental chemists and toxicologists access to data, information and knowledge. With an overwhelming array of online resources and an increasingly rich collection of software tools, the ability to source information continues to expand. Scientists typically seek chemical data in the form of chemical properties, their function and use, as well as information regarding their exposure potential, persistence in the environment and their transformation in environmental and biological systems. Commonly, the most pressing concern regarding chemicals is their potential as environmental toxicants. The increasing rate of production and release of new chemicals into commerce requires improved access to historical data and information to assist in hazard and risk assessment. High-throughput in vitro and in silico analyses increasingly are being brought to bear to rapidly screen chemicals for their potential impacts and interweaving this information with more traditional in vivo toxicity data and exposure estimation to provide integrated insight into chemical risk is a burgeoning frontier on the cusp of cheminformatics and environmental sciences.

This symposium will bring together a series of talks to provide an overview of the present state of data, tools, databases and approaches available to environmental chemists. The session will include the various modeling approaches and platforms, will examine the issues of data quality and curation, and intends to provide the attendees with details regarding availability, utility and applications of these systems. We will focus especially on the availability of Open systems, data and code to ensure no limitations to access and reuse.

The topics that would be covered in this session are, but are not limited to:

  • Environmental chemistry databases
  • Data: Quality, Modeling and Delivery
  • Computational hazard and risk assessment
  • Prioritizing environmental chemicals using screening and predictive computational tools
  • Standards for data exchange and integration in environmental chemistry
  • Implementations of Read-across prediction
  • Adverse Outcome Pathway data and delivery


Please submit your abstracts using the ACS Meeting Abstracts Programming System (MAPS) at  General information about the conference can be found at  Any other inquiries should be directed to the symposium organizers:

Antony J. Williams and Chris Grulke, National Center for Computational Toxicology, Environmental Protection Agency, Research Triangle Park, Durham, NC

Emails: and

Leave a comment

Posted by on September 12, 2016 in ACS Meetings, Chemicals and our Health


The EPA Online Prediction Physicochemical Prediction Platform to Support Environmental Scientists

This poster was presented at the American Chemical Society in Philadelphia in August 2016 at the Sci-Mix gathering and at the ENVR section on Wednesday.

August 22, 2016 from 8:00 PM to 10:00 PM


SESSION TIME: Wednesday, August, 24, 2016, 6:00 PM – 8:00 PM
Hall D – Pennsylvania Convention Center

Poster Title: The EPA Online Prediction Physicochemical Prediction Platform to Support Environmental Scientists

As part of our efforts to develop a public platform to provide access to predictive models we have attempted to disentangle the influence of the quality versus quantity of data available to develop and validate QSAR models.  Using a thorough manual review of the data underlying the well-known EPI Suite software, we developed automated processes for the validation of the data using a KNIME workflow. This includes: approaches to validate different chemical structure representations (e.g. molfile and SMILES), identifiers (chemical names and registry numbers), and methods to standardize the data into QSAR-consumable formats for modeling. Our efforts to quantify and segregate data into various quality categories has allowed us to thoroughly investigate the resulting models developed from these data slices, as well as allowing us to examine whether or not efforts into the development of large high-quality datasets has the expected pay-off in terms of prediction performance. Machine-learning approaches have been applied to create a series of models that have been used to generate predicted physicochemical and environmental parameters for over 700,000 chemicals. These data are available online via the EPA’s iCSS Chemistry Dashboard. This abstract does not reflect U.S. EPA policy.