In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning

Recently we published on the curation of physicochemical data sets that were then made available as Open Data. The work was reported in:

“An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling, SAR and QSAR in Environmental Research, K. Mansouri, C.Grulke, R. Judson and A.J. Williams, SAR and QSAR in Environmental Research,Volume 27 2016 – Issue 11, Pages 911-937 http://dx.doi.org/10.1080/1062936X.2016.1253611

The data has since been modeled using an alternative approach to that we used and is now reported in http://dx.doi.org/10.1021/acs.jcim.6b00625.

 

“In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning, Q. Zang, K. Mansouri, A.J. Williams, R.S. Judson, D.G. Allen, W.M. Casey, and N.C. Kleinstreuer, J. Chem. Inf. Model., 2017, 57 (1), pp 36–49″

The abstract for the article is below

ABSTRACT

There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as those being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source quantitative structure–property relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol–water partition coefficient (logP), water solubility (logS), boiling point (BP), melting point (MP), vapor pressure (logVP), and bioconcentration factor (logBCF). The coefficient of determination (R2) between the estimated values and experimental data for the six predicted properties ranged from 0.826 (MP) to 0.965 (BP), with model performance for five of the six properties exceeding those from the original EPI Suite models. The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.

No Comments

How Poor Altmetrics are for my old articles…

In preparation for a talk later this week I have been investigating adding Altmetric and Plum analytics scores into my online CV as we as Kudos Resources. I would expect that Altmetric scores would be VERY low for old articles as they were published way before the social networking tools existed. However, the Plum Widget should be useful in terms of showing citations, views and downloads etc. The Kudos resources will be meaningful since I have been working SLOWLY through my articles with the latest first.

I think the Altmetric scores shown below bears out my opinion since MOST don’t have any score whatsoever. However, this blog post should lift a number of them over the next few days.


ARTICLES

1989
1. F.L. Lee, K.F. Preston, A.J. Williams, L.H. Sutcliffe, A.J. Banister, S.T. Wait, A single-crystal electron paramagnetic resonance study of the 4-phenyl-1,2,3,5-dithiadiazolyl radical   Magn. Reson. Chem. 27, 1161-1165 (1989). Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

1990
2. D.G. Gillies, S.J. Matthews, L.H. Sutcliffe and A.J. Williams, The Evaluation of Two Correlation Times for Methyl Groups from Carbon-13 Spin-lattice Relaxation Times and nOe Data  J. Magn. Reson., 86, 371 (1990) Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

3. P.J. Bratt, D.G. Gillies, L.H. Sutcliffe and A.J. Williams, NMR Relaxation Studies of Internal Motions – A Comparison between Micelles and Related Systems, J. Phys. Chem., 94(7), 2727 (1990) Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

4. R.C. Hynes, J.R. Morton, J.A. Hriljac, Y. LePage, K.F. Preston, A.J. Williams, F. Evans, M.C. Grossel and L.H. Sutcliffe,  Isolated Free Radical Pairs in Rb+TCNQ- 18-crown-6 Single Crystals, J.Chem. Soc.,Chem. Commun., 5, 439 (1990) Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

5. P.J. Krusic, J.R. Morton, K.F. Preston, A.J. Williams and F. Lee, EPR Spectrum of the Fe2(CO)8- Radical Trapped in Single Crystals of PPN+HFe2(CO)8- , Organometallics 9, 697 (1990). Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

6. R. Hynes, K.F. Preston, J.J. Springs, and A.J. Williams, Single-crystal EPR Study of Radical Pairs in [Fe(mesitylene)22+] {C3[C(CN)2]3-}2, J. Chem. Phys. 93(4), 2222, 1990 Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

7. R. Hynes, K.F. Preston, J.J. Springs, and A.J. Williams, EPR Studies of Radical Pairs [M(CO)5]2 (M = Cr, Mo, W) Trapped in Single Crystals of PPN+ HM(CO)5-, Organometallics, 9, 2298 (1990) Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

8. R. Hynes, K.F. Preston, J.J. Springs, and A.J. Williams, Electron paramagnetic resonance study of the tetracarbonyl(trimethylphosphite)tungstate(1-) radical anion trapped in a single crystal of [N(PPh3)2][W(CO)4H{P(OMe)3}], Journal of the Chemical Society, Dalton Transactions:  Inorganic Chemistry (1972-1999)  12, 3655-61(1990) Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

1991
9. R. Hynes, K.F. Preston, J.J. Springs, J. Tse and A.J. Williams, EPR Studies of M(CO)5-  Radicals (M = Cr, Mo, W) Trapped in Single Crystals of PPh4+ HM(CO)5- , J. Chem. Soc. Faraday Trans., 87(19), 3121 (1991) Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

10. R.C. Hynes, J.R. Morton, K.F. Preston, A.J. Williams, F. Evans, M.C. Grossel, L.H. Sutcliffe, and S.C. Weston, An EPR Study of Isolated Free Radical Pairs in M+ 18-Crown-6 TCNQ-  salts (TCNQ:7,7,8,8-tetracyanoquinodimethane; M=K, Rb), J. Chem. Soc. Faraday Trans., 87(14), 2229 (1991) Link
AltMetrics Analytics

PLUMX Analytics

Kudos Resources

__________________________________________

To show what it looked like when I posted this blog entry the attached image shows a small number of the articles with zero scores.

altmetric scores

No Comments

Add Altmetric and PlumX scores and Kudos Resources to your online CV

Over the weekend I spent a little time working to integrate Altmetric and PlumX scores to my online CV here on my blog. I also integrated my Kudos resources associated with an article directly into the CV.it’s a breeze and requires only that you have DOIs for your article. See below for how ONE article in my CV is represented.

154. Programmatic Conversion of Crystal Structures into 3D Printable Files, V.F. Scalfani, <strong>A.J. Williams</strong>, V. Tkachenko, K. Karapetyan, A. Pshenichnov, R.M. Hanson, J.M. Liddie and J.E. Bara, Journal of Cheminformatics, 2016, 8:66 Article Type: Methodology <a href=”http://jcheminf.springeropen.com/articles/10.1186/s13321-016-0181-z”><strong>Link</strong> </a>
<strong>AltMetrics Analytics</strong>
<div class=”altmetric-embed” data-badge-type=”medium-donut” data-badge-details=”right” data-doi=”10.1186/s13321-016-0181-z“></div>
<strong>PLUMX Analytics</strong>
<a href=’https://plu.mx/plum/a?doi=10.1186/s13321-016-0181-z‘ class=’plumx-plum-print-popup’></a>
<strong>Kudos Resources</strong>
<script src=”//api.growkudos.com/widgets/resources/10.1186/s13321-016-0181-z“></script>

Literally all you have to do is copy these few lines and swap out the DOI and the scores and Kudos resources will show up in your CV. Simple.

Altmetric, PlumX and Kudos Embedded widgets

No Comments

Comparing the EPA CompTox Dashboard with ChemSpider for MS-based Structure Identification

It’s almost ten years, this April, since ChemSpider was released to the public at the 233rd ACS meeting in Chicago. For two years, prior to being acquired by RSC in May 2009, we worked very closely with a number of mass spectrometry vendors including Waters (Micromass), Thermo and Agilent. I always considered that the work that we did with ChemSpider could be highly valued by the mass spectrometry community. This was especially true after we published the work for the identification of known unknowns with James Little (http://link.springer.com/article/10.1007/s13361-011-0265-y)  Certainly ChemSpider has become highly recognized, and used, by an increasing number of mass spectrometry vendors (through the ChemSpider Web Services).

A few months ago Andrew McEachran joined our team as a postdoc. Combining my experience with bringing ChemSpider to bear for the purpose of structure identification, his mass spectrometry skills and experience, and our tremendous development team to the development of the CompTox Chemistry Dashboard, we were able to make some further advances in the “identification known unknowns”. Our efforts were recently reported in this publication “Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard” (http://link.springer.com/article/10.1007%2Fs00216-016-0139-z). Readers are pointed to the summary tables in the article (results) demonstrating the improved performance of the CompTox Chemistry Dashboard based on high quality data sources and new approaches to rank ordering results based on formula and mass searching.

We recently rolled out new functionality and “MS-Ready structure batch-based searching” to offer even greater support for MS-structure identification . We will report on further extensions to this work at the Spring ACS Meeting.

 
The AltMetrics for the Article are shown below

No Comments

Spring ACS Meeting San Francisco, April 2017

The Spring ACS Meeting is coming, and it’s coming quickly. Every time the New Year starts I think I have a long time before I have to assemble posters and write talks for the ACS Meeting. When I worked at the RSC it was easier in some ways as NO ONE reviewed them, no one gave comments on them and there was no clearance process involved. Mostly I was writing the talks on the flight out to the ACS or, more commonly, was writing them the evening before or morning of the presentations. There have been days when I got up in the morning at 4am to write two talks on the day I presented. Quite exhausting but at least I got to show the latest and greatest capabilities.

As an employee at the EPA there are different expectations especially in regards to the clearance process where the presentations are reviewed and signed off, pushed through our internal repository and, post-presentation, released to the community via Science Inventory. Some, not all, of the presentations and papers I have been involved with since joining EPA, are here.

I will be going to the ACS meeting with a number of colleagues and chairing a session on Thursday, all day, with Chris Grulke for the Division of Environmental Chemistry. I will be presenting a number of posters and presentations as listed below. A number of my colleagues will also be presenting. Andrew McEachran, a recent postdoc with the center will be presenting on a lot of the work that has been done in terms of the use of the Chemistry Dashboard to facilitate structure identification. The recent publication “Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard” (http://link.springer.com/article/10.1007%2Fs00216-016-0139-z) reported on a comparison of the dashboard versus ChemSpider. Since then we have rolled out a lot of new functionality to support structure identification and Andrew will report on that.

PAPER ID: 2624963
PAPER TITLE: Twenty five years in cheminformatics: A career path through a diverse series of roles and responsibilities

DIVISION: Division of Chemical Information
SESSION: Careers in Chemical Information
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – AM

PAPER ID: 2616719
PAPER TITLE: Evaluating suspect screening and non-targeted analysis approaches using a collaborative research trial at the US EPA

DIVISION: Division of Analytical Chemistry
SESSION: Analytical Division Poster Session
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – EVE

PAPER ID: 2624980
PAPER TITLE: EPA CompTox chemistry dashboard: An online resource for environmental chemists

DIVISION: Division of Chemical Health and Safety
SESSION: Information Flow in Environmental Health & Safety
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Tuesday, April, 04, 2017 – PM
PAPER ID: 2624984
PAPER TITLE: Delivering an informational hub for data at the National Center for Computational Toxicology

DIVISION: Division of Environmental Chemistry
SESSION: Applications of Cheminformatics & Computational Chemistry in Environmental Health
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Wednesday, April, 05, 2017 – EVE

Looking forward to seeing you at ACS!

 

No Comments

Where did all of these Articles Associated With Me Come From on Mendeley

Recently I posted that Google must have changed their algorithm and as a result introduced a lot of new articles to my profile automagically that were nothing to do with me. It took work to prune them off and hopefully they do not reappear. Tonight I went through the process of updating the past few months of publications to get my Mendeley profile up to date and, lo and behold, there were a whole series of new publications that were NOT there the last time that I checked Mendeley. Interestingly they were all articles about superconducting materials as many of those that had appeared on my Google profile were. Is it possible that Elsevier is somehow sourcing the information from Scholar? Or is Elsevier sourcing these articles from within its own library? Of course the articles all have an author “A. Williams” associated with them. I have already started the process of pruning them out. Not happy…

Articles associated with A. Williams on Mendeley

Articles associated with A. Williams on Mendeley

1 Comment

Mendeley Expanding my Worldwide Followers in a Big Way

I adopted Mendeley very early and was a defender of their decision to join Elsevier. I didn’t beat them up in the mediasphere for moving from the Open start-up to the publishers corporate mode. I did that myself when ChemSpider was acquired by the Royal Society of Chemistry (RSC is a charity but is also a publisher).

Over the past few weeks I have noticed new followers showing up on my profile. In the first couple of years most of my Mendeley followers were actually names I recognized from my domains of experience of cheminformatics and Nuclear Magnetic Resonance. Most of the followers were scientists whose papers I had read and whose work I was aware of. But things are now different.

I have pasted a picture below of the past month or so of new followers. I don’t recognize any of them at all and as far as I can see they are not from my domain, based on me drilling down into their profile. I cannot figure out whether these are just random followers or not but I guess I should appreciate Mendeley and Elsevier for exposing my work, and publications, to a worldwide community of new followers. I am surprised by the new international exposure! THANKS

The past few days of new Mendeley followers

The past few days of new Mendeley followers

 

 

No Comments

PRESENTATION: Building an Online Profile Using Social Networking and Amplification Tools for Scientists

This presentation was given as a 2 hour hands-on training course at the Frontier Building in the Research Triangle Park in NC funded by an Industry Award Grant from the ACS and matching financial support from the Research Triangle Institute.

Abstract “Many of us nowadays invest significant amounts of time in sharing our activities and opinions with friends and family via social networking tools such as Facebook, Twitter or other related websites. However, despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our careers. We are already being indexed and exposed on the internet via our publications, presentations and data and new “AltMetric scores” are being assigned to scientific publications as measures of popularity and, supposedly, of impact. We now have even more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation provides an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. It will discuss the new world of AltMetrics that is in an explosive growth curve and will help you understand how to influence and leverage some of these new measures. Participating online, whether it be simply for career advancement or for wider exposure of your research, there are now a series of web applications that can provide a great opportunity to develop a scientific profile within the community.”

No Comments

Why Have I Pushed so Much Traffic To Twitter This Weekend? GAMING or SAVVY?

Next Tuesday, November 29th, I am leading a two hour workshop as described here:

The NC-ACS together with RTI International is excited to provide dinner and a workshop titled “Building an Online Profile Using Social Networking and Amplification Tools for Scientists”!

DATE AND TIME: Tue, November 29, 2016, 6:00 PM – 9:00 PM EST

LOCATION: The Frontier, 800 Park Offices Drive, Triangle, NC 27709

The event includes dinner from The Farmery starting at 6PM! The workshop will begin promptly at 6:30PM.

Please note to bring your computer and let our Speaker, Antony Williams, help you build your online profile!

Space is limited!  Please register here: https://ncacssocialnetworking.eventbrite.com

In advance of that gathering I was fortunate to have two papers published last week and I wanted to show how I could use Social Media to drive attention, views, downloads and altmetrics to those papers. They are:

Programmatic conversion of crystal structures into 3D printable files using Jmol at http://dx.doi.org/10.1186/s13321-016-0181-z

and

An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling at http://dx.doi.org/10.1080/1062936x.2016.1253611

I started pushing the 3D printing article out on Friday morning and noticed a surge in attention early in the day and it continued throughout the day. I kept attention going throughout the weekend and saw less attention and while it is possible that I saturated my network of connections I think what is more likely is people are simply away from their computers at the weekend and Twitter will get less attention from the overall network. That’s my hypothesis, yet to be proven. It SHOULD be noted that the initial surge in AltMetrics came from the publisher themselves when they pushed it out for us as authors. See https://twitter.com/jcheminf/status/802078618629373952. I suggest making sure your PUBLISHER is pushing out your article via Twitter as part of their service. And BOOK PUBLISHERS should be using Twitter in the same way.

For the automated curation procedure for data curation and QSAR modeling paper I FOUND that on Friday night about midnight….as I kept checking back to see when it was finally published. (Emails to authors would be a good idea don’t you think?). I pushed that out after midnight on Friday and the attention, and corresponding AltMetrics are way less than for the 3D article. Maybe it’s because the article is less interesting (but I don’t agree with that for my network). Maybe, and more likely I think, is Friday night release and throughout Saturday has less overall Twitter attention (see original hypothesis). But it could be I simply saturated the network with my first 3D printing posting. It’s not possible to tease this out with this one experiment so there will be others. Maybe the study has already been done???

In any case the 3D printing one has good altmetric scores now (40 as of 12:50pm on Sunday) and the QSAR modeling paper is lagging (a score of 4). I think a big contribution to the lagging altmetrics for the QSAR modeling paper is the fact that SAR and QSAR in Environmental Research from Taylor and Francis may not have much of a following and may not tweet out the article directly (the last comments I saw about SAR and QSAR on Twitter were mostly in 2013) . One other MAJOR contributing factor may be that JChemInf is FULLY Open Access and our 3D article is fully Open. The SAR and QSAR article in Taylor and Francis has an Open Access option and we didn’t use it, yet. Again, just hypotheses.

Thanks to @JChemInf for doing their job well re. pushing it out to Twitter.I think it helped….

No Comments

Programmatic conversion of crystal structures into 3D printable files using Jmol

A new paper that came out of a collaboration initiated at an ACS Meeting, maybe three years ago, has finally gone online. My recollection is that at an ACS CINF reception I started chatting with Vincent Scalfani. At that time I was involved with ChemSpider and he bounced an idea about 3D printing of crystal structures. I reported that we were going to host the Crystal Structures on ChemSpider (here) and Vincent even presented on it at the ACS (here, with >2000 views). But as happened on a fairly regular basis a great idea never came to fruition and the data were not put onto ChemSpider, and I left to join the EPA over eighteen months ago.

But it was still great work, and when it was made clear that the data would not see light of day the original article, written 2 years ago give or take, was adjusted to simply communicate that the data were available on Figshare here (https://dx.doi.org/10.6084/m9.figshare.c.3302859.v6). The peer review process gave good feedback and pretty much said “Why aren’t they on a searchable database”? Well, we tried, but Bob Hanson, JMol-hero, got to work and produced this site in a few days! Bob is incredibly productive.

Well then the paper was accepted, all is good, the data are open and the world has access to tens of thousands of crystal structures ready for printing.

The paper is available here: “Programmatic conversion of crystal structures into 3D printable files using Jmol” at http://jcheminf.springeropen.com/articles/10.1186/s13321-016-0181-z

No Comments

%d bloggers like this: