Predicting organ toxicity using in vitro bioactivity data and chemical structure

06 Aug

I get to work with some great scientists in my job. I am getting to work on projects that a couple of years ago were way out of my depth. Let’s be honest, I have no formal training as a toxicologist and my training is formally as an analytical scientist, then cheminformatician, then into publishing and informatics and now in the National Center for Computational Toxicology. I didn’t realize that the trial by fire would be so stimulating and fun but working at EPA is great. So many people make flippant comments about working for the government, leaving early, etc. We work HARD and are productive and, for me at least, I feel we are doing important work and making real contributions. The latest paper I am involved with is “Predicting organ toxicity using in vitro bioactivity data and chemical structure” ( The abstract is listed below…

“Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches making use of high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performance was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. Overall, the results demonstrate that a combination of bioactivity and chemical descriptors can accurately predict a range of target organ toxicity outcomes in repeat-dose studies, but specific experimental and methodologic improvements may increase predictivity.”


About tony

Founder of ChemZoo Inc., the host of ChemSpider ( ChemSpider is an open access online database of chemical structures and property transaction based services to enable chemists around the world to data mine chemistry databases. The Royal Society of Chemistry acquired ChemSpider in May 2009. Presently working as a consortium member of the OpenPHACTS IMI project ( This focuses on how drug discovery can utilize semantic technologies to improve decision making and brings together 22 European team members to develop an infrastructure to link together public and private data for the drug discovery community. I am also involved with the PharmaSea FP7 project ( trying to identify new classes of marine natural products with potential pharmacological activity. I am also one of the hosts for three wikis for Science: ScientistsDB, SciMobileApps and SciDBs. Over the past decade I held many responsibilities including the direction of the development of scientific software applications for spectroscopy and general chemistry, directing marketing efforts, sales and business development collaborations for the company. Eight years experience of analytical laboratory leadership and management. Experienced in experimental techniques, implementation of new NMR technologies, walk-up facility management, research and development, manufacturing support and teaching. Ability to provide situation analysis, creative solutions and establish good working relationships. Prolific author with over a 150 peer-reviewed scientific publications, 3 patents and over 300 public presentations. Specialties Leadership in the domain of free access Chemistry, Product and project management, Organizational and Leadership development, Competitive analysis and Business Development, Entrepreneurial.
Leave a comment

Posted by on August 6, 2017 in EPA Presentations


Leave a Reply

%d bloggers like this: