Predicting organ toxicity using in vitro bioactivity data and chemical structure

I get to work with some great scientists in my job. I am getting to work on projects that a couple of years ago were way out of my depth. Let’s be honest, I have no formal training as a toxicologist and my training is formally as an analytical scientist, then cheminformatician, then into publishing and informatics and now in the National Center for Computational Toxicology. I didn’t realize that the trial by fire would be so stimulating and fun but working at EPA is great. So many people make flippant comments about working for the government, leaving early, etc. We work HARD and are productive and, for me at least, I feel we are doing important work and making real contributions. The latest paper I am involved with is “Predicting organ toxicity using in vitro bioactivity data and chemical structure” ( The abstract is listed below…

“Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches making use of high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performance was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. Overall, the results demonstrate that a combination of bioactivity and chemical descriptors can accurately predict a range of target organ toxicity outcomes in repeat-dose studies, but specific experimental and methodologic improvements may increase predictivity.”

  1. No comments yet.
(will not be published)

%d bloggers like this: