RSS

Category Archives: Publications and Presentations

Presentations and Posters at #ACSPhiladelphia August 2016

I will be delivering five presentations and a poster (twice) at the ACS Meeting in Philadelphia this week. These presentations will introduce the latest version of our CompTox Dashboard, renamed from the iCSS Chemistry Dashboard because now we are offering way more than just a large set of chemical structures! I look forward to introducing attendees to the latest and greatest.

DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 1:10 PM – 1:35 PM
ROOM & LOCATION: Room 105A – Pennsylvania Convention Center

Title: Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard

The iCSS Chemistry Dashboard is a publicly accessible dashboard provided by the National Center for Computation Toxicology at the US-EPA. It serves a number of purposes, including providing a chemistry database underpinning many of our public-facing projects (e.g. ToxCast and ExpoCast). The available data and searches provide a valuable path to structure identification using mass spectrometry as the source data. With an underlying database of over 720,000 chemicals, the dashboard has already been used to assist in identifying chemicals present in house dust. However, it can also be applied to many other purposes, e.g., the identification of agrochemicals in waste streams. This presentation will provide a review of the EPA’s platform and underlying algorithms used for the purpose of compound identification using high-resolution mass spectrometry data. We will also discuss progress towards a high-throughput non-targeted analysis platform for use by the mass spectrometry community.  This abstract does not reflect U.S. EPA policy.

 

DAY & TIME OF PRESENTATION: Sunday, August, 21, 2016 from 4:10 PM – 4:30 PM
ROOM & LOCATION: Room 112B – Pennsylvania Convention Center

Title: Investigating Impact Metrics for Performance for the US-EPA National Center for Computational Toxicology

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This work involves computational and data driven approaches that integrate chemistry, exposure and biological data. We have delivered public access to terabytes of open data, as well to a large number of publicly accessible databases and applications, to support the research efforts for a large community of scientists. Many of our contributions to science are summarily described in research papers but  to date we have not optimized our contributions to  inform altmetrics statistics associated with our work. Critically missing from altmetrics is access to our numerous software applications and web service accesses, as well as the growing importance of our experimental data and models (e.g ToxCast, ExpoCast, DSSTox and others) to the scientific and regulatory communities.  This presentation will provide an overview of our efforts to more fully understand, and quantify, our impact on the environmental sciences using a combination of our measurement approaches and available altmetrics tools. This abstract does not reflect U.S. EPA policy.

DAY & TIME OF PRESENTATION: Wednesday, August, 24, 2016 from 9:40 AM – 10:00 AM
ROOM & LOCATION:
Juniper’s Ballroom – Philadelphia Downtown Courtyard by Marriott

Title: Delivering The Benefits of Chemical-Biological Integration in Computational Toxicology at the EPA

Abstract: Researchers at the EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The intention of this research program is to quickly evaluate thousands of chemicals for potential risk but with much reduced cost relative to historical approaches. This work involves computational and data driven approaches including high-throughput screening, modeling, text-mining and the integration of chemistry, exposure and biological data. We have developed a number of databases and applications that are delivering on the vision of developing a deeper understanding of chemicals and their effects on exposure and biological processes that are supporting a large community of scientists in their research efforts. This presentation will provide an overview of our work to bring together diverse large scale data from the chemical and biological domains, our approaches to integrate and disseminate these data, and the delivery of models supporting computational toxicology. This abstract does not reflect U.S. EPA policy.

 

DAY & TIME OF PRESENTATION: Wednesday, August, 24, 2016 from 11:10 AM – 11:40 AM
ROOM & LOCATION: Ormandy East – DoubleTree by Hilton Hotel Philadelphia Center City

Title: Data Aggregation, Curation and Modeling Approaches to Deliver Prediction Models to Support Computational Toxicology at the EPA

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program develops and utilizes QSAR modeling approaches across a broad range of applications. In terms of physical chemistry we have a particular interest in the prediction of basic physicochemical parameters such as logP, aqueous solubility, vapor pressure and other parameters to invoke in our exposure models or for the purpose of modeling environmental toxicity. We are also interested in the development of models related to environmental fate. As a result of our efforts we have assembled and curated data sets for various physicochemical properties and, utilizing modern machine-learning modeling approaches, have developed a number of high performing models that we are now delivering to the public. Our website, the iCSS Chemistry Dashboard, provides access to data predicted for over 700,000 chemical compounds. The original training data are available for review and the details of prediction for each endpoint include the domain of applicability as well as a measure of performance accuracy.  This presentation will provide an overview of the existing aggregated data, our approaches to data curation and our progress towards an interactive environment for prediction of physicochemical and environmental fate parameters. The utilization of these parameters to support read-across approaches will also be discussed. This abstract does not reflect U.S. EPA policy.

 

DAY & TIME OF PRESENTATION: Thursday, August, 25, 2016 from 3:00 PM – 3:20 PM
ROOM & LOCATION:: Room 104A – Pennsylvania Convention Center

Title: The EPA iCSS Chemistry Dashboard to Support Compound Identification Using High Resolution Mass Spectrometry Data

There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.

 

SESSION: Sci-Mix
SESSION TIME:
August 22, 2016 from 8:00 PM to 10:00 PM

and

SESSION TIME: Wednesday, August, 24, 2016, 6:00 PM – 8:00 PM
ROOM & LOCATION:
Hall D – Pennsylvania Convention Center

Poster Title: The EPA Online Prediction Physicochemical Prediction Platform to Support Environmental Scientists

As part of our efforts to develop a public platform to provide access to predictive models we have attempted to disentangle the influence of the quality versus quantity of data available to develop and validate QSAR models.  Using a thorough manual review of the data underlying the well-known EPI Suite software, we developed automated processes for the validation of the data using a KNIME workflow. This includes: approaches to validate different chemical structure representations (e.g. molfile and SMILES), identifiers (chemical names and registry numbers), and methods to standardize the data into QSAR-consumable formats for modeling. Our efforts to quantify and segregate data into various quality categories has allowed us to thoroughly investigate the resulting models developed from these data slices, as well as allowing us to examine whether or not efforts into the development of large high-quality datasets has the expected pay-off in terms of prediction performance. Machine-learning approaches have been applied to create a series of models that have been used to generate predicted physicochemical and environmental parameters for over 700,000 chemicals. These data are available online via the EPA’s iCSS Chemistry Dashboard. This abstract does not reflect U.S. EPA policy.

 

 

Our dire need to mandate data standards and expectations for scientific publishing

This is a presentation that I delivered at the ACS Division of Chemical Information meeting regarding “Reproducibility, Reporting, Sharing & Plagiarism” at ACS Denver on 23rd March 2015.

I took the opportunity to remove my hat that has me be the VP of Strategic Development at RSC, and a member of the cheminformatics group that built ChemSpider and works on other RSC projects related to it. Instead I presented on how a LACK OF MANDATES from publishers on me in terms of submission of data accompanying articles I am involved with writing is actually weakening my scientific record as data is not getting shared in the most useful forms possible to the benefit of the community. I think there would be benefits for publishers to start pushing me for MORE data, in fairly general standards, and allowing me (and others) to download the data in the form of molecules (and collections), spectral data, CSV files etc.

 

 

Providing Access to a Million NMR Spectra via the web

This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CHED Division symposium

Providing Access to a Million NMR Spectra via the web

Antony Williams, Alexey Pshenichnov, Peter Corbett, Daniel Lowe, Carlos Coba

Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s Learn Chemistry. These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.

 

 

Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact

This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CINF Division symposium

Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact.

Antony Williams, Will Russell, Melinda Kenneway and Louise Peck

The authoring of a scientific publication can represent the culmination of many tens if not 100s of hours of data collection and analysis. The authoring and peer-review process itself often represents a major undertaking in terms of assembling the publication and passing through review. Considering the amount of work invested in the production of a scientific article it is therefore quite surprising that authors, post-publication, invest very little effort in communicating the value and potential impact of their article to the community. Social networking has clearly demonstrated the ability to self-market and drive attention. At the same time, the increasing volume of literature (over a million new articles are published every year), requires authors to take on a more direct role in ensuring their work gets read and cited. This requirement may grow with the emergence of a range of metrics at the article level, shifting attention away from where a researcher publishes to the performance of their individual articles. Therefore, a separate platform to facilitate social networking and other discovery tools to communicate the value of published science to the community would be of value. In parallel the possibility to enhance an article by linking to additional information (presentations, videos, blog posts etc) allows for enrichment of the article post-publication, a capability not available via the publishers platform. This presentation will provide a personal overview of the experiences of using the Kudos Platform and how it ultimately benefits my ability to communicate an integrated view of my research to the community.

 

 

Tags:

PITTCON poster: Dealing with the complex challenge of managing diverse analytical chemistry data online

This is a talk I presented at Pittcon on Wednesday March 13th, 2015

Dealing with the complex challenge of managing diverse analytical chemistry data online

The Royal Society of Chemistry provides open access to data associated with tens of millions of chemical compounds. The richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process delivering a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on the challenges of managing “Big Data” for chemists around the world and providing access to tools for structure dereplication, spectral database searching and the crowdsourcing of the worlds’ largest spectral database.

 

 

PITTCON Poster: Using an online database of chemical compounds for the purpose of structure identification

This is a poster I presented at Pittcon on Wednesday March 9th, 2015

Using an online database of chemical compounds for the purpose of structure identification

Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.

 

 

Tags:

PITTCON Poster: ChemSpider – building an online database of open spectra

This is a poster I presented at Pittcon on Wednesday March 11th, 2015

ChemSpider – building an online database of open spectra

ChemSpider is an online database of over 30 million chemical compounds sourced from over 500 different sources including government laboratories, chemical vendors, public resources and publications. Developed with the intention of building community for chemists ChemSpider allows its users to deposit data including structures, properties, links to external resources and various forms of spectral data. Over the past few years ChemSpider has aggregated almost 20000 high quality NMR and IR spectra and continues to expand as the community deposits additional types of data. The majority of spectral data is licensed as Open Data allowing it to be downloaded and reused in presentations, lesson plans and for teaching purposes. This poster will present our existing technology and our plans to host a million spectra in our developing online data repository.

 

Tags:

A TERRIBLE implementation of Name Searching on ACS Journals

Yes, I am a Williams. And THAT is an incredibly common surname. But I am an Antony Williams, notice no H in the name, i.e. NOT Anthony. In the field of chemistry there are not many of us around…a couple I know of, but not many overall. Google Scholar does an extremely good job of automatically associating my newly published articles with my Citations profile here: https://scholar.google.com/citations?user=O2L8nh4AAAAJ

The last five articles automatically associated with my profile. I do NOT make any associations manually at this point.

The last five articles automatically associated with my profile. I do NOT make any associations manually at this point.

I am assuming that this is done by understanding the type of work I publish on, some of the co-author network maps that have been established as my profile has developed etc. I assume that there approach is very intelligent relative to some of the more commonplace searches that have been implemented….certainly the results are GOOD.

I noticed one disastrous example today when our article “ChemTrove: Enabling a Generic ELN to Support Chemistry Through the Use of Transferable Plug-ins and Online Data Sources” was published on the Journal of Chemical Information and Modeling here. Right there to the left of the abstract is an offer to look at other content by the authors.

Look for related content by the authors on JCIM

Look for related content by the authors on JCIM

I was interested to see what else ACS knew about my content so I clicked on my name…which performed this search: http://pubs.acs.org/action/doSearch?ContribStored=Williams%2C+A  and provided me with 96 articles by Andrew Williams (mostly), by Aaron Williams, by Anthony Williams (not me) and Allan Williams (to name a few). Eventually I managed to find 3 that were associated with me by searching the list for Antony Williams but none of those I published as Antony J. Williams were recovered.

Also, my colleague Valery Tkachenko is listed as an author with a misspelling as Valery Tkachenkov. What is simply inappropriate in my opinion is how the process involved taking the list of our submitted names..copied below directly from the submitted manuscript and changing them to their own interpretation of how we would want to see our names listed.

From this:

Aileen E. Day*†, Simon J. Coles, Colin L. Bird, Jeremy G. Frey, Richard J. Whitby, Valery E. Tkachenko§, Antony J. Williams§

To This:

Names changed from the original manuscript to those produced at submission

Names changed from the original manuscript to those produced at submission

Notice that for Aileen and Jeremy the middle initials were expanded, Colin had his middle initial changed from L. to I.,  Richard, Valery and I had our middle initials dropped and Valery had a v added to his surname. Why not simply copy and paste the names from the manuscript?

I will point out that this is a “Just Accepted” manuscript and likely the changes in names will be caught and edited, especially now I have just pointed them out. “Just accepted” does have some disclaimers:

The disclaimers regarding Just Accepted manuscripts

The disclaimers regarding Just Accepted manuscripts

While they can edit the names to match what we originally provided I don’t think it will fix the issue regarding finding all of my articles on ACS journals as when  navigated to one of my other articles here, http://pubs.acs.org/doi/abs/10.1021/es0713072, and did the search from my listed name it found exactly the same 96 hits.

Maybe a thought to use my ORCID profile http://orcid.org/0000-0002-2668-4821 to look for ACS journal articles associated with my name?

Unfortunately the data is already out in the wild as when I claimed the article on Kudos all of the name spelling issues had clearly spilled over via the DOI: https://www.growkudos.com/articles/10.1021%252Fci5005948

Names transferred via DOI to the Grow Kudos Platform

Names transferred via DOI to the Grow Kudos Platform

Ah…the things that surprise me….or not.

 

A chemistry data repository to serve them all

A presentation that I am giving around UK universities in September/October 2014

A chemistry data repository to serve them all

Over the past five years the Royal Society of Chemistry has become world renowned for its public domain compound database that integrates chemical structures with online resources and available data. ChemSpider regularly serves over 50,000 users per day who are seeking chemistry related data. In parallel we have used ChemSpider and available software services to underpin a number of grant-based projects that we have been involved with: Open PHACTS – a semantic web project integrating chemistry and biology data, PharmaSea – seeking out new natural products from the ocean and the National Chemical Database Service for the United Kingdom. We are presently developing a new architecture that will offer broader scope in terms of the types of chemistry data that can be hosted. This presentation will provide an overview of our Cheminformatics activities at RSC, the development of a new architecture for a data repository that will underpin a global chemistry network, and the challenges ahead, as well as our activities in releasing software and data to the chemistry community.

 

Presentations given at the ACS Meeting in San Francisco #ACSsanfran

Recently returned from the ACS meeting in San Francisco it was a busy and very successful conference. We presented to a number of different divisions on a lot of our activities and many of our collaborators presented also. The list of talks is below and as more links become available I will update this page. What I learned is that we need to present in MANY other divisions other than CINF…the attendees of the CHED and ANLY divisions for sure were interested in what we have to say. We will do more of this…

Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSEA project, A.J. Williams. A. Pshenichnov, V. Tkachenko, K. Karapetyan and D. Sharpe, ACS Fall Meeting, San Francisco, August 2014 Link

How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry, A.J. Williams, V. Tkachenko and K. Karapetyan, ACS Fall Meeting, San Francisco, August 2014 (Invited Talk) Link

Dealing with the complex challenge of managing diverse chemistry data online, A.J. Williams, A. Pshenichnov, V. Tkachenko and K. Karapetyan, ACS Fall Meeting, San Francisco, August 2014 Link

Encouraging undergraduate students to participate as authors of scientific publications, A.J. Williams, ACS Fall Meeting, San Francisco, August 2014 Link

Who knew I would get here from there: How I became the ChemConnector, A.J. Williams, ACS Fall Meeting, San Francisco, August 2014 (Invited Talk) Link

Open innovation and chemistry data management contributions from the Royal Society of Chemistry resulting from the Open PHACTS project, A.J. Williams. A. Pshenichnov, J. Steele, C. Batchelor, V. Tkachenko, K. Karapetyan and V. Tkachenko, ACS Fall Meeting, San Francisco, August 2014 Link

Using an online database of chemical compounds for the purpose of structure identification, A.J. Williams, A. Pshenichnov and V. Tkachenko, ACS Fall Meeting, San Francisco, August 2014

The Royal Society of Chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world, A.J. Williams and V. Tkachenko, ACS Fall Meeting, San Francisco, August 2014 Link

Accessing 3D printable chemical structures online. V. F. Scalfani, A. J. Williams, R. M. Hanson, J. E. Bara, A. Day, V. Tkachenko, ACS Fall Meeting, San Francisco, August 2014 Link

Using the BRAIN, biorelations and intelligence network, for knowledge discovery. A. Mons, B. Mons, A. Krol, A.Baak, A.J. Williams, V. Tkachenko, ACS Fall Meeting, San Francisco, August 2014

Navigating chemistry requirements for data management and electronic notebooks: A case study. L. R. McEwen, A. J. Williams, V. Tkachenko, J. G. Frey, S. J. Coles, A. E. Day, C. Willoughby, W. R. Dichtel, ACS Fall Meeting, San Francisco, August 2014

The Chemical Analysis Metadata Platform (ChAMP): Thoughts and Ideas on the Semantic Identification of Analytical Metrics, S. Chalk,  A.J. Williams, V.Tkachenko San Francisco, August 2014 Link

Integrating Jmol/JSpecView into the Eureka Research Workbench. S. Chalk, M. Morse, I. Hurst, A.J. Williams, V.Tkachenko, A. Pshenichnov, R. Hanson, ACS Fall Meeting, San Francisco, August 2014

Clustering the Royal Society of Chemistry chemical repository to enable enhanced navigation across millions of chemicals. K. Karapetyan, V. Tkachenko, A. J. Williams, O. Kohlbacher, P. Thiel, ACS Fall Meeting, San Francisco, August 2014 Link

Experiences and adventures with noSQL and its applications to cheminformatics data. V. Tkachenko, A.J. Williams, K. Karapetyan, A. Pshenichnov, M. Rybalkin, ACS Fall Meeting, San Francisco, August 2014 Link

Faculty profiling and searching in the Eureka Research Workbench using VIVO and ScientistsDB. S. Chalk, M.Morse, I. Hurst, A.J. Williams, V. Tkachenko, A. Pshenichnov, ACS Fall Meeting, San Francisco, August 2014

Supporting the exploding dimensions of the chemical sciences via global networking. V. Tkachenko, A.J. Williams, S. Vatsadze, ACS Fall Meeting, San Francisco, August 2014 Link

Toward extracting analytical science metrics from the RSC archives. S. Chalk, A.J. Williams, V. Tkachenko, C.Batchelor, ACS Fall Meeting, San Francisco, August 2014

Dereplication applications for computer-assisted structure elucidation (CASE) and the ChemSpider database. P.Wheeler, A. Moser, J. DiMartio, M. Elyashberg, K. Blinov, S. Molodstov, A.J. Williams, ACS Fall Meeting, San Francisco, August 2014 (Invited  talk)

Real structures for real natural products − really getting them right and getting them faster. P. Wheeler, A.J. Williams, M. Elyashberg, R. Pol, A. Moser, ACS Fall Meeting, San Francisco, August 2014

The increasing importance of chemical information literacy in the life of graduate students: Contributions from the ACS Division of Chemical Information (CINF, G. Baysinger, J. Currano, J. Garritano, L. R McEwen, A. J Williams