Having just returned from Pittcon late last night I am now turning my attention to the next set of presentations to be given at the ACS Denver meeting. These are listed below. If any of the blog readers will be at the ACS meeting it would be great to catch-up. See you there.
PAPER TITLE: Importance of data standards for large scale data integration in chemistry (final paper number: CINF 39)
DAY & TIME OF PRESENTATION: Wednesday, March, 25, 2015 from 11:20 AM – 11:50 AM
ROOM & LOCATION: Room 110 – Colorado Convention Center
ABSTRACT
Increasingly online databases are being used for the purpose of structure identification. In many cases an unknown to an investigator is known in the chemical literature or online database and these “known unknowns” are commonly available in these aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. We will report on the search approaches that we offer on aggregated compound databases hosted by the Royal Society of Chemistry and how these resources can be used for the purpose of structure identification. We will also report on our progress in the area of hosting interactive spectral data, including assignments, on our data repository and how we are using our analytical data platform for the purpose of natural product dereplication.
PAPER TITLE: Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact (final paper number: CINF 8)
DAY & TIME OF PRESENTATION: Sunday, March, 22, 2015 from 2:15 PM – 2:40 PM
ROOM & LOCATION: Room 110 – Colorado Convention Center
ABSTRACT
The authoring of a scientific publication can represent the culmination of many tens if not 100s of hours of data collection and analysis. The authoring and peer-review process itself often represents a major undertaking in terms of assembling the publication and passing through review. Considering the amount of work invested in the production of a scientific article it is therefore quite surprising that authors, post-publication, invest very little effort in communicating the value and potential impact of their article to the community. Social networking has clearly demonstrated the ability to self-market and drive attention. At the same time, the increasing volume of literature (over a million new articles are published every year), requires authors to take on a more direct role in ensuring their work gets read and cited. This requirement may grow with the emergence of a range of metrics at the article level, shifting attention away from where a researcher publishes to the performance of their individual articles. Therefore, a separate platform to facilitate social networking and other discovery tools to communicate the value of published science to the community would be of value. In parallel the possibility to enhance an article by linking to additional information (presentations, videos, blog posts etc) allows for enrichment of the article post-publication, a capability not available via the publishers platform. This presentation will provide a personal overview of the experiences of using the Kudos Platform and how it ultimately benefits my ability to communicate an integrated view of my research to the community.
PAPER TITLE: Providing access to a million NMR spectra via the web (final paper number: CHED 91)
SESSION: NMR Spectroscopy in the Undergraduate Curriculum
DAY & TIME OF PRESENTATION: Sunday, March, 22, 2015 from 4:15 PM – 4:35 PM
ROOM & LOCATION: Gold – Sheraton Denver Downtown Hotel
ABSTRACT
Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s SpectralGame (www.spectralgame.com). These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.
PAPER TITLE: Using online chemistry databases to facilitate structure identification in mass spectral data (final paper number: ANYL 45)
SESSION: Advances in Mass Spectrometry
DAY & TIME OF PRESENTATION: Tuesday, March, 24, 2015 from 8:45 AM – 9:05 AM
ROOM & LOCATION: Aspen Room A – Embassy Suites Denver – Downtown Convention Center
ABSTRACT
The Royal Society of Chemistry hosts large scale data collections and provides access to the data to the chemistry community. The largest RSC data set of wide scale interest to the community offers access to tens of millions of compounds. The host platform, ChemSpider, is limited as it is a structure centric hub only. A new architecture, the RSC data repository, has been developed that extends support to reactions, spectral data, crystallography data and related property data. It is also the architecture underlying a series of exemplar projects for managing data for a number of diverse laboratories. The adoption of data standards for the integration and distribution of data has been essential. Specific standards include molecular structure formats such as molfiles and InChIs, and spectral data formats such as JCAMP. This presentation will report on our development of the data repository, the importance of utilizing standards for data integration, the flexible nature of the architecture to deliver solutions for various laboratories and our efforts to develop new large data collections. This includes text-mining efforts to extract large spectrum-structure collections from large corpuses.