I have the pleasure of collaborating with Emma Schymanski and we are literally in daily contact bouncing ideas regarding how to improve the state-of-the-science and informatics for Mass Spectrometry Non-Target Screening. We are both actively out at conferences representing the effort and are iteratively moving things forward (with so many other colleagues we get to work with) so that each presentation reports on the latest developments. Emma presented in Rome this week at the SETAC Europe 28th Annual Meeting and had the chance to show the work that has been going on to integrate the CompTox Chemistry Dashboard and MetFrag. More on that will be reported in detail soon but for now her slides from the meeting are available on SlideShare and embedded here.
Category Archives: Publications and Presentations
I am happy to announce the publishing of an article regarding “Open Science for Identifying “Known Unknown” Chemicals” at http://dx.doi.org/10.1021/acs.est.7b01908. I have been involved with two other articles about the identification of “Known Unknowns”.
The first one was a ChemSpider article: “”Identification of “known unknowns” utilizing accurate mass data and ChemSpider”. Journal of The American Society for Mass Spectrometry. 23: 179–185. doi:10.1007/s13361-011-0265-y.”
The second one was a recent article from the EPA: “”Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard”. Analytical and Bioanalytical Chemistry. 409: 1729–1735. doi:10.1007/s00216-016-0139-z.”
The most recent publication was a collaboration with Emma Schymanski from Eawag and it was a real pleasure to write this together. If you are interested in how Open Science can contribute to the challenges associated with the identification of known unknowns check out our latest publication!
In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning
Recently we published on the curation of physicochemical data sets that were then made available as Open Data. The work was reported in:
“An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling, SAR and QSAR in Environmental Research, K. Mansouri, C.Grulke, R. Judson and A.J. Williams, SAR and QSAR in Environmental Research,Volume 27 2016 – Issue 11, Pages 911-937 http://dx.doi.org/10.1080/1062936X.2016.1253611”
The data has since been modeled using an alternative approach to that we used and is now reported in http://dx.doi.org/10.1021/acs.jcim.6b00625.
“In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning, Q. Zang, K. Mansouri, A.J. Williams, R.S. Judson, D.G. Allen, W.M. Casey, and N.C. Kleinstreuer, J. Chem. Inf. Model., 2017, 57 (1), pp 36–49″
The abstract for the article is below
There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as those being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source quantitative structure–property relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol–water partition coefficient (logP), water solubility (logS), boiling point (BP), melting point (MP), vapor pressure (logVP), and bioconcentration factor (logBCF). The coefficient of determination (R2) between the estimated values and experimental data for the six predicted properties ranged from 0.826 (MP) to 0.965 (BP), with model performance for five of the six properties exceeding those from the original EPI Suite models. The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.
The Spring ACS Meeting is coming, and it’s coming quickly. Every time the New Year starts I think I have a long time before I have to assemble posters and write talks for the ACS Meeting. When I worked at the RSC it was easier in some ways as NO ONE reviewed them, no one gave comments on them and there was no clearance process involved. Mostly I was writing the talks on the flight out to the ACS or, more commonly, was writing them the evening before or morning of the presentations. There have been days when I got up in the morning at 4am to write two talks on the day I presented. Quite exhausting but at least I got to show the latest and greatest capabilities.
As an employee at the EPA there are different expectations especially in regards to the clearance process where the presentations are reviewed and signed off, pushed through our internal repository and, post-presentation, released to the community via Science Inventory. Some, not all, of the presentations and papers I have been involved with since joining EPA, are here.
I will be going to the ACS meeting with a number of colleagues and chairing a session on Thursday, all day, with Chris Grulke for the Division of Environmental Chemistry. I will be presenting a number of posters and presentations as listed below. A number of my colleagues will also be presenting. Andrew McEachran, a recent postdoc with the center will be presenting on a lot of the work that has been done in terms of the use of the Chemistry Dashboard to facilitate structure identification. The recent publication “Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard” (http://link.springer.com/article/10.1007%2Fs00216-016-0139-z) reported on a comparison of the dashboard versus ChemSpider. Since then we have rolled out a lot of new functionality to support structure identification and Andrew will report on that.
PAPER ID: 2624963
PAPER TITLE: Twenty five years in cheminformatics: A career path through a diverse series of roles and responsibilities
DIVISION: Division of Chemical Information
SESSION: Careers in Chemical Information
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – AM
PAPER ID: 2616719
PAPER TITLE: Evaluating suspect screening and non-targeted analysis approaches using a collaborative research trial at the US EPA
DIVISION: Division of Analytical Chemistry
SESSION: Analytical Division Poster Session
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – EVE
PAPER ID: 2624980
PAPER TITLE: EPA CompTox chemistry dashboard: An online resource for environmental chemists
DIVISION: Division of Chemical Health and Safety
SESSION: Information Flow in Environmental Health & Safety
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Tuesday, April, 04, 2017 – PM
PAPER ID: 2624984
PAPER TITLE: Delivering an informational hub for data at the National Center for Computational Toxicology
DIVISION: Division of Environmental Chemistry
SESSION: Applications of Cheminformatics & Computational Chemistry in Environmental Health
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Wednesday, April, 05, 2017 – EVE
Looking forward to seeing you at ACS!
Next Tuesday, November 29th, I am leading a two hour workshop as described here:
“The NC-ACS together with RTI International is excited to provide dinner and a workshop titled “Building an Online Profile Using Social Networking and Amplification Tools for Scientists”!
DATE AND TIME: Tue, November 29, 2016, 6:00 PM – 9:00 PM EST
LOCATION: The Frontier, 800 Park Offices Drive, Triangle, NC 27709
The event includes dinner from The Farmery starting at 6PM! The workshop will begin promptly at 6:30PM.
Please note to bring your computer and let our Speaker, Antony Williams, help you build your online profile!
Space is limited! Please register here: https://ncacssocialnetworking.eventbrite.com”
In advance of that gathering I was fortunate to have two papers published last week and I wanted to show how I could use Social Media to drive attention, views, downloads and altmetrics to those papers. They are:
Programmatic conversion of crystal structures into 3D printable files using Jmol at http://dx.doi.org/10.1186/s13321-016-0181-z
An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling at http://dx.doi.org/10.1080/1062936x.2016.1253611
I started pushing the 3D printing article out on Friday morning and noticed a surge in attention early in the day and it continued throughout the day. I kept attention going throughout the weekend and saw less attention and while it is possible that I saturated my network of connections I think what is more likely is people are simply away from their computers at the weekend and Twitter will get less attention from the overall network. That’s my hypothesis, yet to be proven. It SHOULD be noted that the initial surge in AltMetrics came from the publisher themselves when they pushed it out for us as authors. See https://twitter.com/jcheminf/status/802078618629373952. I suggest making sure your PUBLISHER is pushing out your article via Twitter as part of their service. And BOOK PUBLISHERS should be using Twitter in the same way.
For the automated curation procedure for data curation and QSAR modeling paper I FOUND that on Friday night about midnight….as I kept checking back to see when it was finally published. (Emails to authors would be a good idea don’t you think?). I pushed that out after midnight on Friday and the attention, and corresponding AltMetrics are way less than for the 3D article. Maybe it’s because the article is less interesting (but I don’t agree with that for my network). Maybe, and more likely I think, is Friday night release and throughout Saturday has less overall Twitter attention (see original hypothesis). But it could be I simply saturated the network with my first 3D printing posting. It’s not possible to tease this out with this one experiment so there will be others. Maybe the study has already been done???
In any case the 3D printing one has good altmetric scores now (40 as of 12:50pm on Sunday) and the QSAR modeling paper is lagging (a score of 4). I think a big contribution to the lagging altmetrics for the QSAR modeling paper is the fact that SAR and QSAR in Environmental Research from Taylor and Francis may not have much of a following and may not tweet out the article directly (the last comments I saw about SAR and QSAR on Twitter were mostly in 2013) . One other MAJOR contributing factor may be that JChemInf is FULLY Open Access and our 3D article is fully Open. The SAR and QSAR article in Taylor and Francis has an Open Access option and we didn’t use it, yet. Again, just hypotheses.
Thanks to @JChemInf for doing their job well re. pushing it out to Twitter.I think it helped….
This presentation will be given at the Janelia Farm Research Campus, a research campus of the Howard Hughes Medical Institute. The presentation abstract is below.
Despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our careers. We are already being indexed and exposed on the internet via our publications, presentations and data and new “AltMetric scores” are being assigned to scientific publications as measures of popularity and, supposedly, of impact. We now have even more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation provides an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. It will discuss the new world of AltMetrics that is in an explosive growth curve and will help you understand how to influence and leverage some of these new measures. Participating online, whether it be simply for career advancement or for wider exposure of your research, there are now a series of web applications that can provide a great opportunity to develop a scientific profile within the community.
Last night I was honored to receive an award from the North Carolina Local Section of the American Chemical Society. I had the chance to review the past 20 years of my career with the attendees. I assembled a slide deck from about ten years of slides stored on Slideshare (I am glad I have been storing them there as it’s a great online storage place!). I appreciate the recognition from the Local Division. THANKS!
Investigating Impact Metrics for Performance for the US-EPA National Center for Computational Toxicology
This presentation was presented at the American Chemical Society in Philadelphia in August 2016
DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 4:10 PM – 4:30 PM
ROOM & LOCATION: Room 112B – Pennsylvania Convention Center
Title: Investigating Impact Metrics for Performance for the US-EPA National Center for Computational Toxicology
The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This work involves computational and data driven approaches that integrate chemistry, exposure and biological data. We have delivered public access to terabytes of open data, as well to a large number of publicly accessible databases and applications, to support the research efforts for a large community of scientists. Many of our contributions to science are summarily described in research papers but to date we have not optimized our contributions to inform altmetrics statistics associated with our work. Critically missing from altmetrics is access to our numerous software applications and web service accesses, as well as the growing importance of our experimental data and models (e.g ToxCast, ExpoCast, DSSTox and others) to the scientific and regulatory communities. This presentation will provide an overview of our efforts to more fully understand, and quantify, our impact on the environmental sciences using a combination of our measurement approaches and available altmetrics tools. This abstract does not reflect U.S. EPA policy.
Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard
This presentation was presented at the American Chemical Society in Philadelphia in August 2016
DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 1:10 PM – 1:35 PM
ROOM & LOCATION: Room 105A – Pennsylvania Convention Center
Title: Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard
The iCSS Chemistry Dashboard is a publicly accessible dashboard provided by the National Center for Computation Toxicology at the US-EPA. It serves a number of purposes, including providing a chemistry database underpinning many of our public-facing projects (e.g. ToxCast and ExpoCast). The available data and searches provide a valuable path to structure identification using mass spectrometry as the source data. With an underlying database of over 720,000 chemicals, the dashboard has already been used to assist in identifying chemicals present in house dust. However, it can also be applied to many other purposes, e.g., the identification of agrochemicals in waste streams. This presentation will provide a review of the EPA’s platform and underlying algorithms used for the purpose of compound identification using high-resolution mass spectrometry data. We will also discuss progress towards a high-throughput non-targeted analysis platform for use by the mass spectrometry community. This abstract does not reflect U.S. EPA policy.