Archive for category Publications and Presentations

In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning

Recently we published on the curation of physicochemical data sets that were then made available as Open Data. The work was reported in:

“An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling, SAR and QSAR in Environmental Research, K. Mansouri, C.Grulke, R. Judson and A.J. Williams, SAR and QSAR in Environmental Research,Volume 27 2016 – Issue 11, Pages 911-937 http://dx.doi.org/10.1080/1062936X.2016.1253611

The data has since been modeled using an alternative approach to that we used and is now reported in http://dx.doi.org/10.1021/acs.jcim.6b00625.

 

“In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning, Q. Zang, K. Mansouri, A.J. Williams, R.S. Judson, D.G. Allen, W.M. Casey, and N.C. Kleinstreuer, J. Chem. Inf. Model., 2017, 57 (1), pp 36–49″

The abstract for the article is below

ABSTRACT

There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as those being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source quantitative structure–property relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol–water partition coefficient (logP), water solubility (logS), boiling point (BP), melting point (MP), vapor pressure (logVP), and bioconcentration factor (logBCF). The coefficient of determination (R2) between the estimated values and experimental data for the six predicted properties ranged from 0.826 (MP) to 0.965 (BP), with model performance for five of the six properties exceeding those from the original EPI Suite models. The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.

No Comments

Spring ACS Meeting San Francisco, April 2017

The Spring ACS Meeting is coming, and it’s coming quickly. Every time the New Year starts I think I have a long time before I have to assemble posters and write talks for the ACS Meeting. When I worked at the RSC it was easier in some ways as NO ONE reviewed them, no one gave comments on them and there was no clearance process involved. Mostly I was writing the talks on the flight out to the ACS or, more commonly, was writing them the evening before or morning of the presentations. There have been days when I got up in the morning at 4am to write two talks on the day I presented. Quite exhausting but at least I got to show the latest and greatest capabilities.

As an employee at the EPA there are different expectations especially in regards to the clearance process where the presentations are reviewed and signed off, pushed through our internal repository and, post-presentation, released to the community via Science Inventory. Some, not all, of the presentations and papers I have been involved with since joining EPA, are here.

I will be going to the ACS meeting with a number of colleagues and chairing a session on Thursday, all day, with Chris Grulke for the Division of Environmental Chemistry. I will be presenting a number of posters and presentations as listed below. A number of my colleagues will also be presenting. Andrew McEachran, a recent postdoc with the center will be presenting on a lot of the work that has been done in terms of the use of the Chemistry Dashboard to facilitate structure identification. The recent publication “Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard” (http://link.springer.com/article/10.1007%2Fs00216-016-0139-z) reported on a comparison of the dashboard versus ChemSpider. Since then we have rolled out a lot of new functionality to support structure identification and Andrew will report on that.

PAPER ID: 2624963
PAPER TITLE: Twenty five years in cheminformatics: A career path through a diverse series of roles and responsibilities

DIVISION: Division of Chemical Information
SESSION: Careers in Chemical Information
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – AM

PAPER ID: 2616719
PAPER TITLE: Evaluating suspect screening and non-targeted analysis approaches using a collaborative research trial at the US EPA

DIVISION: Division of Analytical Chemistry
SESSION: Analytical Division Poster Session
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – EVE

PAPER ID: 2624980
PAPER TITLE: EPA CompTox chemistry dashboard: An online resource for environmental chemists

DIVISION: Division of Chemical Health and Safety
SESSION: Information Flow in Environmental Health & Safety
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Tuesday, April, 04, 2017 – PM
PAPER ID: 2624984
PAPER TITLE: Delivering an informational hub for data at the National Center for Computational Toxicology

DIVISION: Division of Environmental Chemistry
SESSION: Applications of Cheminformatics & Computational Chemistry in Environmental Health
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Wednesday, April, 05, 2017 – EVE

Looking forward to seeing you at ACS!

 

No Comments

PRESENTATION: Building an Online Profile Using Social Networking and Amplification Tools for Scientists

This presentation was given as a 2 hour hands-on training course at the Frontier Building in the Research Triangle Park in NC funded by an Industry Award Grant from the ACS and matching financial support from the Research Triangle Institute.

Abstract “Many of us nowadays invest significant amounts of time in sharing our activities and opinions with friends and family via social networking tools such as Facebook, Twitter or other related websites. However, despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our careers. We are already being indexed and exposed on the internet via our publications, presentations and data and new “AltMetric scores” are being assigned to scientific publications as measures of popularity and, supposedly, of impact. We now have even more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation provides an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. It will discuss the new world of AltMetrics that is in an explosive growth curve and will help you understand how to influence and leverage some of these new measures. Participating online, whether it be simply for career advancement or for wider exposure of your research, there are now a series of web applications that can provide a great opportunity to develop a scientific profile within the community.”

No Comments

Why Have I Pushed so Much Traffic To Twitter This Weekend? GAMING or SAVVY?

Next Tuesday, November 29th, I am leading a two hour workshop as described here:

The NC-ACS together with RTI International is excited to provide dinner and a workshop titled “Building an Online Profile Using Social Networking and Amplification Tools for Scientists”!

DATE AND TIME: Tue, November 29, 2016, 6:00 PM – 9:00 PM EST

LOCATION: The Frontier, 800 Park Offices Drive, Triangle, NC 27709

The event includes dinner from The Farmery starting at 6PM! The workshop will begin promptly at 6:30PM.

Please note to bring your computer and let our Speaker, Antony Williams, help you build your online profile!

Space is limited!  Please register here: https://ncacssocialnetworking.eventbrite.com

In advance of that gathering I was fortunate to have two papers published last week and I wanted to show how I could use Social Media to drive attention, views, downloads and altmetrics to those papers. They are:

Programmatic conversion of crystal structures into 3D printable files using Jmol at http://dx.doi.org/10.1186/s13321-016-0181-z

and

An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling at http://dx.doi.org/10.1080/1062936x.2016.1253611

I started pushing the 3D printing article out on Friday morning and noticed a surge in attention early in the day and it continued throughout the day. I kept attention going throughout the weekend and saw less attention and while it is possible that I saturated my network of connections I think what is more likely is people are simply away from their computers at the weekend and Twitter will get less attention from the overall network. That’s my hypothesis, yet to be proven. It SHOULD be noted that the initial surge in AltMetrics came from the publisher themselves when they pushed it out for us as authors. See https://twitter.com/jcheminf/status/802078618629373952. I suggest making sure your PUBLISHER is pushing out your article via Twitter as part of their service. And BOOK PUBLISHERS should be using Twitter in the same way.

For the automated curation procedure for data curation and QSAR modeling paper I FOUND that on Friday night about midnight….as I kept checking back to see when it was finally published. (Emails to authors would be a good idea don’t you think?). I pushed that out after midnight on Friday and the attention, and corresponding AltMetrics are way less than for the 3D article. Maybe it’s because the article is less interesting (but I don’t agree with that for my network). Maybe, and more likely I think, is Friday night release and throughout Saturday has less overall Twitter attention (see original hypothesis). But it could be I simply saturated the network with my first 3D printing posting. It’s not possible to tease this out with this one experiment so there will be others. Maybe the study has already been done???

In any case the 3D printing one has good altmetric scores now (40 as of 12:50pm on Sunday) and the QSAR modeling paper is lagging (a score of 4). I think a big contribution to the lagging altmetrics for the QSAR modeling paper is the fact that SAR and QSAR in Environmental Research from Taylor and Francis may not have much of a following and may not tweet out the article directly (the last comments I saw about SAR and QSAR on Twitter were mostly in 2013) . One other MAJOR contributing factor may be that JChemInf is FULLY Open Access and our 3D article is fully Open. The SAR and QSAR article in Taylor and Francis has an Open Access option and we didn’t use it, yet. Again, just hypotheses.

Thanks to @JChemInf for doing their job well re. pushing it out to Twitter.I think it helped….

No Comments

Social Media Tools for Scientists and Building an Online Profile

This presentation will be given at the Janelia Farm Research Campus, a research campus of the Howard Hughes Medical Institute. The presentation abstract is below.

ABSTRACT
Despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our careers. We are already being indexed and exposed on the internet via our publications, presentations and data and new “AltMetric scores” are being assigned to scientific publications as measures of popularity and, supposedly, of impact. We now have even more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation provides an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. It will discuss the new world of AltMetrics that is in an explosive growth curve and will help you understand how to influence and leverage some of these new measures. Participating online, whether it be simply for career advancement or for wider exposure of your research, there are now a series of web applications that can provide a great opportunity to develop a scientific profile within the community.

No Comments

My NC-ACS Distinguished Speaker Award Presentation

Last night I was honored to receive an award from the North Carolina Local Section of the American Chemical Society. I had the chance to review the past 20 years of my career with the attendees. I assembled a slide deck from about ten years of slides stored on Slideshare (I am glad I have been storing them there as it’s a great online storage place!). I appreciate the recognition from the Local Division. THANKS!

No Comments

Investigating Impact Metrics for Performance for the US-EPA National Center for Computational Toxicology

This presentation was presented at the American Chemical Society in Philadelphia in August 2016

DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 4:10 PM – 4:30 PM
ROOM & LOCATION: Room 112B – Pennsylvania Convention Center

Title: Investigating Impact Metrics for Performance for the US-EPA National Center for Computational Toxicology

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This work involves computational and data driven approaches that integrate chemistry, exposure and biological data. We have delivered public access to terabytes of open data, as well to a large number of publicly accessible databases and applications, to support the research efforts for a large community of scientists. Many of our contributions to science are summarily described in research papers but  to date we have not optimized our contributions to  inform altmetrics statistics associated with our work. Critically missing from altmetrics is access to our numerous software applications and web service accesses, as well as the growing importance of our experimental data and models (e.g ToxCast, ExpoCast, DSSTox and others) to the scientific and regulatory communities.  This presentation will provide an overview of our efforts to more fully understand, and quantify, our impact on the environmental sciences using a combination of our measurement approaches and available altmetrics tools. This abstract does not reflect U.S. EPA policy.

No Comments

Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard

This presentation was presented at the American Chemical Society in Philadelphia in August 2016

DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 1:10 PM – 1:35 PM
ROOM & LOCATION: Room 105A – Pennsylvania Convention Center

Title: Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard

The iCSS Chemistry Dashboard is a publicly accessible dashboard provided by the National Center for Computation Toxicology at the US-EPA. It serves a number of purposes, including providing a chemistry database underpinning many of our public-facing projects (e.g. ToxCast and ExpoCast). The available data and searches provide a valuable path to structure identification using mass spectrometry as the source data. With an underlying database of over 720,000 chemicals, the dashboard has already been used to assist in identifying chemicals present in house dust. However, it can also be applied to many other purposes, e.g., the identification of agrochemicals in waste streams. This presentation will provide a review of the EPA’s platform and underlying algorithms used for the purpose of compound identification using high-resolution mass spectrometry data. We will also discuss progress towards a high-throughput non-targeted analysis platform for use by the mass spectrometry community.  This abstract does not reflect U.S. EPA policy.

 

No Comments

Presentations and Posters at #ACSPhiladelphia August 2016

I will be delivering five presentations and a poster (twice) at the ACS Meeting in Philadelphia this week. These presentations will introduce the latest version of our CompTox Dashboard, renamed from the iCSS Chemistry Dashboard because now we are offering way more than just a large set of chemical structures! I look forward to introducing attendees to the latest and greatest.

DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 1:10 PM – 1:35 PM
ROOM & LOCATION: Room 105A – Pennsylvania Convention Center

Title: Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard

The iCSS Chemistry Dashboard is a publicly accessible dashboard provided by the National Center for Computation Toxicology at the US-EPA. It serves a number of purposes, including providing a chemistry database underpinning many of our public-facing projects (e.g. ToxCast and ExpoCast). The available data and searches provide a valuable path to structure identification using mass spectrometry as the source data. With an underlying database of over 720,000 chemicals, the dashboard has already been used to assist in identifying chemicals present in house dust. However, it can also be applied to many other purposes, e.g., the identification of agrochemicals in waste streams. This presentation will provide a review of the EPA’s platform and underlying algorithms used for the purpose of compound identification using high-resolution mass spectrometry data. We will also discuss progress towards a high-throughput non-targeted analysis platform for use by the mass spectrometry community.  This abstract does not reflect U.S. EPA policy.

 

DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 4:10 PM – 4:30 PM
ROOM & LOCATION: Room 112B – Pennsylvania Convention Center

Title: Investigating Impact Metrics for Performance for the US-EPA National Center for Computational Toxicology

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This work involves computational and data driven approaches that integrate chemistry, exposure and biological data. We have delivered public access to terabytes of open data, as well to a large number of publicly accessible databases and applications, to support the research efforts for a large community of scientists. Many of our contributions to science are summarily described in research papers but  to date we have not optimized our contributions to  inform altmetrics statistics associated with our work. Critically missing from altmetrics is access to our numerous software applications and web service accesses, as well as the growing importance of our experimental data and models (e.g ToxCast, ExpoCast, DSSTox and others) to the scientific and regulatory communities.  This presentation will provide an overview of our efforts to more fully understand, and quantify, our impact on the environmental sciences using a combination of our measurement approaches and available altmetrics tools. This abstract does not reflect U.S. EPA policy.

DAY & TIME OF PRESENTATION: Wednesday, August, 24, 2016 from 9:40 AM – 10:00 AM
ROOM & LOCATION:
Juniper’s Ballroom – Philadelphia Downtown Courtyard by Marriott

Title: Delivering The Benefits of Chemical-Biological Integration in Computational Toxicology at the EPA

Abstract: Researchers at the EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The intention of this research program is to quickly evaluate thousands of chemicals for potential risk but with much reduced cost relative to historical approaches. This work involves computational and data driven approaches including high-throughput screening, modeling, text-mining and the integration of chemistry, exposure and biological data. We have developed a number of databases and applications that are delivering on the vision of developing a deeper understanding of chemicals and their effects on exposure and biological processes that are supporting a large community of scientists in their research efforts. This presentation will provide an overview of our work to bring together diverse large scale data from the chemical and biological domains, our approaches to integrate and disseminate these data, and the delivery of models supporting computational toxicology. This abstract does not reflect U.S. EPA policy.

 

DAY & TIME OF PRESENTATION: Wednesday, August, 24, 2016 from 11:10 AM – 11:40 AM
ROOM & LOCATION: Ormandy East – DoubleTree by Hilton Hotel Philadelphia Center City

Title: Data Aggregation, Curation and Modeling Approaches to Deliver Prediction Models to Support Computational Toxicology at the EPA

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program develops and utilizes QSAR modeling approaches across a broad range of applications. In terms of physical chemistry we have a particular interest in the prediction of basic physicochemical parameters such as logP, aqueous solubility, vapor pressure and other parameters to invoke in our exposure models or for the purpose of modeling environmental toxicity. We are also interested in the development of models related to environmental fate. As a result of our efforts we have assembled and curated data sets for various physicochemical properties and, utilizing modern machine-learning modeling approaches, have developed a number of high performing models that we are now delivering to the public. Our website, the iCSS Chemistry Dashboard, provides access to data predicted for over 700,000 chemical compounds. The original training data are available for review and the details of prediction for each endpoint include the domain of applicability as well as a measure of performance accuracy.  This presentation will provide an overview of the existing aggregated data, our approaches to data curation and our progress towards an interactive environment for prediction of physicochemical and environmental fate parameters. The utilization of these parameters to support read-across approaches will also be discussed. This abstract does not reflect U.S. EPA policy.

 

DAY & TIME OF PRESENTATION: Thursday, August, 25, 2016 from 3:00 PM – 3:20 PM
ROOM & LOCATION:: Room 104A – Pennsylvania Convention Center

Title: The EPA iCSS Chemistry Dashboard to Support Compound Identification Using High Resolution Mass Spectrometry Data

There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.

 

SESSION: Sci-Mix
SESSION TIME:
August 22, 2016 from 8:00 PM to 10:00 PM

and

SESSION TIME: Wednesday, August, 24, 2016, 6:00 PM – 8:00 PM
ROOM & LOCATION:
Hall D – Pennsylvania Convention Center

Poster Title: The EPA Online Prediction Physicochemical Prediction Platform to Support Environmental Scientists

As part of our efforts to develop a public platform to provide access to predictive models we have attempted to disentangle the influence of the quality versus quantity of data available to develop and validate QSAR models.  Using a thorough manual review of the data underlying the well-known EPI Suite software, we developed automated processes for the validation of the data using a KNIME workflow. This includes: approaches to validate different chemical structure representations (e.g. molfile and SMILES), identifiers (chemical names and registry numbers), and methods to standardize the data into QSAR-consumable formats for modeling. Our efforts to quantify and segregate data into various quality categories has allowed us to thoroughly investigate the resulting models developed from these data slices, as well as allowing us to examine whether or not efforts into the development of large high-quality datasets has the expected pay-off in terms of prediction performance. Machine-learning approaches have been applied to create a series of models that have been used to generate predicted physicochemical and environmental parameters for over 700,000 chemicals. These data are available online via the EPA’s iCSS Chemistry Dashboard. This abstract does not reflect U.S. EPA policy.

 

No Comments

Our dire need to mandate data standards and expectations for scientific publishing

This is a presentation that I delivered at the ACS Division of Chemical Information meeting regarding “Reproducibility, Reporting, Sharing & Plagiarism” at ACS Denver on 23rd March 2015.

I took the opportunity to remove my hat that has me be the VP of Strategic Development at RSC, and a member of the cheminformatics group that built ChemSpider and works on other RSC projects related to it. Instead I presented on how a LACK OF MANDATES from publishers on me in terms of submission of data accompanying articles I am involved with writing is actually weakening my scientific record as data is not getting shared in the most useful forms possible to the benefit of the community. I think there would be benefits for publishers to start pushing me for MORE data, in fairly general standards, and allowing me (and others) to download the data in the form of molecules (and collections), spectral data, CSV files etc.

 

No Comments

%d bloggers like this: