RSS

Category Archives: Presentations

Chemical identification of unknowns in high resolution mass spectrometry using the EPA’s CompTox Chemicals Dashboard

I was privileged to give a presentation today at Pittcon 2019 and presented on “Chemical identification of unknowns in high resolution mass spectrometry using the EPA’s CompTox Chemicals Dashboard” with the abstract below.

Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized the detection of chemicals in complex matrices.  However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists.  The US EPA has developed functionality within the CompTox Chemicals Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS.  These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching.  Combining these tools into a comprehensive workflow improves certainty in candidate identification.  This presentation will introduce the tools and combined workflow, including visualization and access via the CompTox Chemicals Dashboard.  These tools, data, and visualization approaches within an open chemistry resource provides a publicly available software tool to support structure identification and non-targeted analyses. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

The slide deck is available on SlideShare here:

Chemical identification of unknowns in high resolution mass spectrometry using the CompTox Chemicals Dashboard from US Environmental Protection Agency (EPA), National Center for Computational Toxicology

 

Presentations at the Spring ACS Meeting in Orlando, April 2019

I am giving a number of presentations at the ACS meeting in Orlando in April 2019. If you are interested in coming to listen and maybe chat after please see the list below.

1) PAPER ID: 3080890 
PAPER TITLE: Consensus ranking and fragmentation prediction for identification of unknowns in high resolution mass spectrometry (final paper number: AGFD 10)


DIVISION: Division of Agricultural and Food Chemistry
SESSION: Recent Advances in Food Fraud & Authenticity Analysis
SESSION TIME: 8:30 AM – 10:55 AM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Sunday, March, 31, 2019 from 9:25 AM – 9:50 AM
ROOM & LOCATION: Florida Ballroom B  – Hyatt Regency Orlando 

Title: Consensus ranking and fragmentation prediction for identification of unknowns in high resolution mass spectrometry

Antony J. Williams1, Andrew McEachran2, Tommy Cathey3, Tom Transue3, Jon Sobus4

High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are advancing the identification of emerging contaminants in environmental and agricultural matrices.  However, confidence in structure identification of unknowns in NTA presents challenges to analytical chemists.  Structure identification requires integration of complementary data types such as reference databases, fragmentation prediction tools, and retention time prediction models.  The goal of this research is to optimize and implement structure identification functionality within the US EPA’s CompTox Chemicals Dashboard, an open chemistry resource and web application containing data for ~760,000 substances.  Rank-ordering the number of sources associated with chemical records within the Dashboard (Data Source Ranking) improves the identification of unknowns by bringing the most likely candidate structures to the top of a search results list.  Incorporating additional data streams contained within the database underlying the Dashboard further enhances identifications.  Integrating tandem mass spectrometry data into NTA workflows enables spectral match scores and increases confidence in structural assignments.  We have generated and stored predicted MS/MS fragmentation spectra for the entirety of the Chemistry Dashboard using the in silico prediction tool CFM-ID.  Predicted fragments incorporated into the identification workflow were used as both a scoring term and as a candidate threshold cutoff.  Combining these steps within an open chemistry resource provides a freely available software tool for structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

2) PAPER ID: 3081133 
PAPER TITLE: Applications of the US EPA’s CompTox chemicals dashboard to support structure identification and chemical forensics using mass spectrometry (final paper number: ANYL 320)


DIVISION: Division of Analytical Chemistry
SESSION: Frontiers in Forensic Mass Spectrometry
SESSION TIME: 8:00 AM – 12:10 PM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Tuesday, April, 02, 2019 from 11:40 AM – 12:10 PM
ROOM & LOCATION: Plaza International Ballroom K  – Hyatt Regency Orlando

Title: Applications of the US EPA’s CompTox Chemicals Dashboard to support structure identification and chemical forensics using mass spectrometry

Antony J. Williams, Andrew D. McEachran, Jon R. Sobus and Emma Schymanski

High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are of increasing interest in chemical forensics for the identification of emerging contaminants and chemical signatures of interest. At the US Environmental Protection Agency, our research using HRMS for non-targeted and suspect screening analyses utilizes databases and cheminformatics approaches that are applicable to chemical forensics. The CompTox Chemicals Dashboard is an open chemistry resource and web-based application containing data for ~760,000 substances. Basic functionality for searching through the data is provided through identifier searches, such as systematic name, trade names and CAS Registry Numbers. Advanced Search capabilities supporting mass spectrometry include mass and formula-based searches, combined substructure-mass searches and searching experimental mass spectral data against predicted fragmentation spectra. A specific type of data mapping in the underpinning database, using “MS-Ready” structures, has proven to be a valuable approach for structure identification that links structures that can be identified via HRMS with related substances in the form of salts, and other multi-component mixtures that are available in commerce. This presentation will provide an overview of the CompTox Chemicals Dashboard and demonstrate its utility for supporting structure identification and NTA in chemical forensics. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

3) PAPER ID: 3084559 
PAPER TITLE: Antony Williams, the ChemConnector: A career path through a diverse series of roles and responsibilities (final paper number: CINF 25)

DIVISION: Division of Chemical Information
SESSION: Careers in Chemical Information
SESSION TIME: 1:30 PM – 4:25 PM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Sunday, March, 31, 2019 from 3:05 PM – 3:25 PM
ROOM & LOCATION: West Hall B4 – Theater 11  – Orange County Convention Center

Antony Williams, the ChemConnector – a career path through a diverse series of roles and responsibilities

Authors: Antony Williams

Antony Williams is a Computational Chemist at the US Environmental Protection Agency in the National Center for Computational Toxicology. He has been involved in cheminformatics and the dissemination of chemical information for over twenty-five years. He has worked for a Fortune 500 company (Eastman Kodak), in two successful start-ups (ACD/Labs and ChemSpider), for the Royal Society of Chemistry (in publishing) and, now, at the EPA. Throughout his career path he has experienced multiple diverse work cultures and focused his efforts on understanding the needs of his employers and the often unrecognized needs of a larger community. Antony will provide a short overview of his career path and discuss the various decisions that helped motivate his change in career from professional spectroscopist to website host and innovator, to working for one of the world’s foremost scientific societies and now for one of the most impactful government organizations in the world. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

4) PAPER ID: 3084590 
PAPER TITLE: US-EPA CompTox chemicals dashboard: A web-based data integration hub for environmental chemistry data (final paper number: CINF 43)


DIVISION: Division of Chemical Information
SESSION: Web-Based Chemoinformatics Platforms
SESSION TIME: 8:00 AM – 11:50 AM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Monday, April, 01, 2019 from 11:20 AM – 11:50 AM
ROOM & LOCATION: West Hall B4 – Theater 10  – Orange County Convention Center

The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environmental Chemistry Data

Authors: Antony Williams, Andrew McEachran, Imran Shah, Richard Judson, John Wambaugh, Nancy Baker, George Helman, Chris Grulke, Kamel Mansouri, Grace Patlewicz, Ann Richard and Jeff Edwards.

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This involves computational and data-driven approaches that integrate chemistry, exposure and biological data. The National Center for Computational Toxicology (NCCT) has measured, assembled and delivered an enormous quantity and diversity of data for the environmental sciences, including high-throughput in vitro screening data, in vivo and functional use data, exposure models and chemical databases with associated properties. The CompTox Chemicals Dashboard is a web-based application providing access to data associated with ~760,000 chemical substances. New data are continuously added to the database on an ongoing basis, along with registration of new and emerging chemicals. This includes data extracted from the literature, identified by our analytical labs, and otherwise of interest to support specific research projects to the agency. By adding these data, with their associated chemical identifiers (names and CAS Registry Numbers), the dashboard uses linking approaches to allow for automated searching of PubMed, Google Scholar and an array of public databases. This presentation will provide an overview of the CompTox Chemicals Dashboard, how it has developed into an integrated data hub for environmental data, and how it can be used for the analysis of emerging chemicals in terms of sourcing related chemicals of interest, and deriving read-across as well as QSAR predictions in real time. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

5) PAPER ID: 3084575 
PAPER TITLE: EPA CompTox chemicals dashboard: An online resource for environmental chemists (final paper number: CINF 94)


DIVISION: Division of Chemical Information
SESSION: Applications of Cheminformatics to Environmental Science
SESSION TIME: 8:00 AM – 12:00 PM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Wednesday, April, 03, 2019 from 8:25 AM – 8:45 AM

ROOM & LOCATION: West Hall B4 – Theater 10  – Orange County Convention Center 

EPA CompTox Chemicals Dashboard – an online resource for environmental chemists

Authors: Antony Williams, Chris Grulke, Jennifer Smith, Kamel Mansouri, Andrew McEachran, Kathie Dionisio, Katherine Phillips, Grace Patlewicz, Jeremy Fitzpatrick, Nancy Baker, Todd Martin, Ann Richard and Jeff Edwards

The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This work involves computational and data driven approaches that integrate chemistry, exposure and biological data. As an outcome of these efforts the National Center for Computational Toxicology (NCCT) has measured, assembled and delivered an enormous quantity and diversity of data for the environmental sciences including high-throughput in vitro screening data, in vivo and functional use data, exposure models and chemical databases with associated properties. A series of software applications and databases have been produced over the past decade to deliver these data. Recent work has focused on the development of a new architecture that assembles the resources into a single platform. With a focus on delivering access to Open Data streams, web service integration accessibility and a user-friendly web application the CompTox Chemicals Dashboard provides access to data associated with ~720,000 chemical substances. These data include research data in the form of bioassay screening data associated with the ToxCast program, experimental and predicted physicochemical properties, product and functional use information and related data of value to environmental scientists. This presentation will provide an overview of the CompTox Chemicals Dashboard and its value to the community as an informational hub. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

6) PAPER ID: 3095464 
PAPER TITLE: Cheminformatics approaches to support chemical identification delivered via the EPA CompTox Chemicals Dashboard (final paper number: ENVR 173)


DIVISION: Division of Environmental Chemistry
SESSION: Accurate Mass/High Resolution Mass Spectrometry for Environmental Monitoring & Remediation
SESSION TIME: 1:00 PM – 4:10 PM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Monday, April, 01, 2019 from 1:25 PM – 1:45 PM
ROOM & LOCATION: Valencia Ballroom B-D – Theater 8  – Orange County Convention Center

Cheminformatics approaches to support chemical identification delivered via the EPA CompTox Chemicals Dashboard

Antony J. Williams, Andrew McEachran, Chris M. Grulke, Elin M. Ulrich and Jon R. Sobus

The identification of chemicals in environment media depends on the application of analytical methods, the primary approach being one of the multiple mass spectrometry techniques. Cheminformatics solutions are critical to supporting the chemical identification process. This includes the assembly of large chemical substance databases, prioritization ranking of potential candidate search hits, and search approaches that support both targeted and non-targeted screening approaches. The US Environmental Protection Agency CompTox Chemicals Dashboard is a web-based application providing access to data for over 760,000 chemical substances. This includes access to physicochemical property, environmental fate and transport data, both human and ecological toxicity data, information regarding chemicals contained in products in commerce, and in vitro bioactivity data. Searches are allowed based on chemical identifiers, product and use, genes and assays associated with the EPA ToxCast assays and, specific to supporting mass spectrometry, searches based on masses and formulae. These searches make use of a novel “MS-Ready structures” approach collapsing chemicals related as mixtures, salts, stereoforms and isotopomers. The dashboard supports both singleton or batch searching by accurate mass/chemical formula, supported by MS-ready structures, and utilizes rich meta data to facilitate candidate ranking and the prioritization of chemicals of concern based on toxicity and exposure data. The dashboard also hosts tens of chemical lists that have been assembled from public databases, many supporting non-targeted analysis and mass spectrometry databases.

This presentation will provide an overview of the dashboard and will review our latest research into structure identification by searching experimental mass spectrometry data against predicted fragmentation spectra for LC-MS (positive and negative ion mode) and GC-MS (EI), a total of 3 million predicted spectra. We will also provide an overview of our progress supporting structure and substructure searching, using mass and formula-based filtering, and report on the latest applications of the dashboard to support structure identification projects of interest to the EPA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

7) PAPER ID: 3084594 
PAPER TITLE: US-EPA comptox chemicals dashboard: an information hub for over five thousand per- & polyfluoroalkyl chemical substances (final paper number: ENVR 217)


DIVISION: Division of Environmental Chemistry
SESSION: Per- & Polyfluoroalkyl Substances in the Environment: From Legacy To Emerging Contaminants
SESSION TIME: 8:30 AM – 12:00 PM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Tuesday, April, 02, 2019 from 10:10 AM – 10:30 AM
ROOM & LOCATION: Valencia Ballroom B-D – Theater 10  – Orange County Convention Center

Title: The US-EPA CompTox Chemicals Dashboard – an information hub for over five thousand per- & polyfluoroalkyl chemical substances

Authors: Antony Williams, Chris Grulke, Grace Patlewicz and Ann Richard

The EPA’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) is a publicly accessible website providing access to data for ~770,000 chemical substances, the majority of these represented as chemical structures. The web application delivers a wide array of computed and measured physicochemical properties, in vitro high-throughput screening data and in vivo toxicity data, product use information extracted from safety data sheets, and integrated chemical linkages to a growing list of literature, toxicology, and analytical chemistry websites. The application provides access to segregated lists of chemicals that are of specific interest to relevant stakeholders, including Per- & Polyfluoroalkyl Substances (PFAS) containing thousands of chemicals. A procured testing library of hundreds of PFAS chemicals annotated into chemical categories has been integrated into the dashboard with a number of resulting benefits: a searchable database of chemical properties, with hazard and exposure predictions, and links to the open literature. Several specific search types have been developed to directly support the mass spectrometry non-targeted screening community, enabling cohesive workflows to support data generation for the detection and assessment of environmental exposures to chemicals contained within DSSTox. This presentation will provide an overview of the dashboard, the ongoing expansion of the PFAS chemical library, with associated categorization, and new physicochemical property and environmental fate and transport QSAR prediction models developed for these chemicals. The application of the dashboard to support mass spectrometry non-targeted analysis studies for the identification of PFAS chemicals will also be reviewed. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

8) PAPER ID: 3084611 
PAPER TITLE: CompTox chemicals dashboard: Data and tools to support chemical and environmental risk assessment and the ENTACT project (final paper number: ENVR 648)


DIVISION: Division of Environmental Chemistry
SESSION: True Positives in EPA’S Non-Targeted Analysis Collaborative Trial (ENTACT)
SESSION TIME: 1:30 PM – 5:00 PM

PRESENTATION FORMAT: Oral
DAY & TIME OF PRESENTATION: Wednesday, April, 03, 2019 from 2:15 PM – 2:35 PM
ROOM & LOCATION: Valencia Ballroom B-D – Theater 13  – Orange County Convention Center

Title: The CompTox Chemicals Dashboard: Data and Tools to Support Chemical and Environmental Risk Assessment and the ENTACT project

Authors and affiliations: Antony J. Williams1, Christopher M. Grulke1, Andrew D. McEachran2, Emma L. Schymanski3,4, Jon Sobus5, Elin Ulrich5, Ann M. Richard1, Jeremy Dunne1 and Jeff Edwards1

1 EPA, National Center for Computational Toxicology, RTP, NC, USA

2 ORISE Fellow, Oak Ridge Institute for Science and Education, Oak Ridge, TN, USA

3 Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, avenue du Swing, L-4367 Belvaux, Luxembourg

4 EPA, National Exposure Research Laboratory, RTP, NC, USA

Information and data on chemicals is used by scientists to evaluate potential health and ecological risks due to environmental exposures. EPA’s CompTox Chemicals Dashboard (https://comptox.epa.gov) helps evaluate the safety of chemicals by providing public access to a variety of information on over 760,000 chemicals. Within the Dashboard, users can access chemical structures, chemistry information, toxicity data, hazard data, exposure information, and additional links to relevant websites and applications. These data are compiled from sources including EPA’s computational toxicology research databases, from public domain databases and with collaborators across the world. Chemical lists have been added that provide access to various classes of chemicals and project-based datasets are under constant development. Specific functionality has been delivered within the Dashboard to support mass spectrometry including “MS-ready forms” of chemical substances that would be detectable by mass spectrometry. Workflows have been developed to assist in candidate identification and have now been proven with multiple published studies. An integration path between the dashboard and MetFrag has also been established to provide users the significant benefits resulting from the marriage between the two applications. The datasets underpinning the dashboard are freely available (https://comptox.epa.gov/dashboard/downloads) for integration into third party databases. This presentation will provide an overview of the available data types and functionality of the dashboard prior to examining how it is developing to support mass spectrometry based analyses within the agency and for the community in general. This will include a review of our research efforts to enhance the dashboard using in silico MS/MS fragmentation prediction for spectral matching. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

 

Spring ACS Meeting San Francisco, April 2017

The Spring ACS Meeting is coming, and it’s coming quickly. Every time the New Year starts I think I have a long time before I have to assemble posters and write talks for the ACS Meeting. When I worked at the RSC it was easier in some ways as NO ONE reviewed them, no one gave comments on them and there was no clearance process involved. Mostly I was writing the talks on the flight out to the ACS or, more commonly, was writing them the evening before or morning of the presentations. There have been days when I got up in the morning at 4am to write two talks on the day I presented. Quite exhausting but at least I got to show the latest and greatest capabilities.

As an employee at the EPA there are different expectations especially in regards to the clearance process where the presentations are reviewed and signed off, pushed through our internal repository and, post-presentation, released to the community via Science Inventory. Some, not all, of the presentations and papers I have been involved with since joining EPA, are here.

I will be going to the ACS meeting with a number of colleagues and chairing a session on Thursday, all day, with Chris Grulke for the Division of Environmental Chemistry. I will be presenting a number of posters and presentations as listed below. A number of my colleagues will also be presenting. Andrew McEachran, a recent postdoc with the center will be presenting on a lot of the work that has been done in terms of the use of the Chemistry Dashboard to facilitate structure identification. The recent publication “Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard” (http://link.springer.com/article/10.1007%2Fs00216-016-0139-z) reported on a comparison of the dashboard versus ChemSpider. Since then we have rolled out a lot of new functionality to support structure identification and Andrew will report on that.

PAPER ID: 2624963
PAPER TITLE: Twenty five years in cheminformatics: A career path through a diverse series of roles and responsibilities

DIVISION: Division of Chemical Information
SESSION: Careers in Chemical Information
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – AM

PAPER ID: 2616719
PAPER TITLE: Evaluating suspect screening and non-targeted analysis approaches using a collaborative research trial at the US EPA

DIVISION: Division of Analytical Chemistry
SESSION: Analytical Division Poster Session
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Sunday, April, 02, 2017 – EVE

PAPER ID: 2624980
PAPER TITLE: EPA CompTox chemistry dashboard: An online resource for environmental chemists

DIVISION: Division of Chemical Health and Safety
SESSION: Information Flow in Environmental Health & Safety
PRESENTATION FORMAT: Oral
DAY & HALF DAY OF PRESENTATION: Tuesday, April, 04, 2017 – PM
PAPER ID: 2624984
PAPER TITLE: Delivering an informational hub for data at the National Center for Computational Toxicology

DIVISION: Division of Environmental Chemistry
SESSION: Applications of Cheminformatics & Computational Chemistry in Environmental Health
PRESENTATION FORMAT: Poster
DAY & HALF DAY OF PRESENTATION: Wednesday, April, 05, 2017 – EVE

Looking forward to seeing you at ACS!

 

 

Social Media Tools for Scientists and Building an Online Profile

This presentation will be given at the Janelia Farm Research Campus, a research campus of the Howard Hughes Medical Institute. The presentation abstract is below.

ABSTRACT
Despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our careers. We are already being indexed and exposed on the internet via our publications, presentations and data and new “AltMetric scores” are being assigned to scientific publications as measures of popularity and, supposedly, of impact. We now have even more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation provides an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. It will discuss the new world of AltMetrics that is in an explosive growth curve and will help you understand how to influence and leverage some of these new measures. Participating online, whether it be simply for career advancement or for wider exposure of your research, there are now a series of web applications that can provide a great opportunity to develop a scientific profile within the community.

 

Providing Access to a Million NMR Spectra via the web

This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CHED Division symposium

Providing Access to a Million NMR Spectra via the web

Antony Williams, Alexey Pshenichnov, Peter Corbett, Daniel Lowe, Carlos Coba

Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s Learn Chemistry. These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.

 

 

Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact

This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CINF Division symposium

Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact.

Antony Williams, Will Russell, Melinda Kenneway and Louise Peck

The authoring of a scientific publication can represent the culmination of many tens if not 100s of hours of data collection and analysis. The authoring and peer-review process itself often represents a major undertaking in terms of assembling the publication and passing through review. Considering the amount of work invested in the production of a scientific article it is therefore quite surprising that authors, post-publication, invest very little effort in communicating the value and potential impact of their article to the community. Social networking has clearly demonstrated the ability to self-market and drive attention. At the same time, the increasing volume of literature (over a million new articles are published every year), requires authors to take on a more direct role in ensuring their work gets read and cited. This requirement may grow with the emergence of a range of metrics at the article level, shifting attention away from where a researcher publishes to the performance of their individual articles. Therefore, a separate platform to facilitate social networking and other discovery tools to communicate the value of published science to the community would be of value. In parallel the possibility to enhance an article by linking to additional information (presentations, videos, blog posts etc) allows for enrichment of the article post-publication, a capability not available via the publishers platform. This presentation will provide a personal overview of the experiences of using the Kudos Platform and how it ultimately benefits my ability to communicate an integrated view of my research to the community.

 

 

Tags:

PITTCON poster: Dealing with the complex challenge of managing diverse analytical chemistry data online

This is a talk I presented at Pittcon on Wednesday March 13th, 2015

Dealing with the complex challenge of managing diverse analytical chemistry data online

The Royal Society of Chemistry provides open access to data associated with tens of millions of chemical compounds. The richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process delivering a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on the challenges of managing “Big Data” for chemists around the world and providing access to tools for structure dereplication, spectral database searching and the crowdsourcing of the worlds’ largest spectral database.

 

 

PITTCON Poster: Using an online database of chemical compounds for the purpose of structure identification

This is a poster I presented at Pittcon on Wednesday March 9th, 2015

Using an online database of chemical compounds for the purpose of structure identification

Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.

 

 

Tags:

PITTCON Poster: ChemSpider – building an online database of open spectra

This is a poster I presented at Pittcon on Wednesday March 11th, 2015

ChemSpider – building an online database of open spectra

ChemSpider is an online database of over 30 million chemical compounds sourced from over 500 different sources including government laboratories, chemical vendors, public resources and publications. Developed with the intention of building community for chemists ChemSpider allows its users to deposit data including structures, properties, links to external resources and various forms of spectral data. Over the past few years ChemSpider has aggregated almost 20000 high quality NMR and IR spectra and continues to expand as the community deposits additional types of data. The majority of spectral data is licensed as Open Data allowing it to be downloaded and reused in presentations, lesson plans and for teaching purposes. This poster will present our existing technology and our plans to host a million spectra in our developing online data repository.

 

Tags:

A chemistry data repository to serve them all

A presentation that I am giving around UK universities in September/October 2014

A chemistry data repository to serve them all

Over the past five years the Royal Society of Chemistry has become world renowned for its public domain compound database that integrates chemical structures with online resources and available data. ChemSpider regularly serves over 50,000 users per day who are seeking chemistry related data. In parallel we have used ChemSpider and available software services to underpin a number of grant-based projects that we have been involved with: Open PHACTS – a semantic web project integrating chemistry and biology data, PharmaSea – seeking out new natural products from the ocean and the National Chemical Database Service for the United Kingdom. We are presently developing a new architecture that will offer broader scope in terms of the types of chemistry data that can be hosted. This presentation will provide an overview of our Cheminformatics activities at RSC, the development of a new architecture for a data repository that will underpin a global chemistry network, and the challenges ahead, as well as our activities in releasing software and data to the chemistry community.

 
 
Stop SOPA