NCSU Presentation: Data integration and building a profile for yourself as an online scientist

This is a presentation I gave at North Carolina State University hosted by Denis Fourches.

Data integration and building a profile for yourself as an online scientist

Many of us nowadays invest significant amounts of time in sharing our activities and opinions with friends and family via social networking tools. However, despite the availability of many platforms for scientists to connect and share with their peers in the scientific community the majority do not make use of these tools, despite their promise and potential impact and influence on our future careers. We are being indexed and exposed on the internet via our publications, presentations and data. We also have many more ways to contribute to science, to annotate and curate data, to “publish” in new ways, and many of these activities are as part of a growing crowdsourcing network. This presentation will provide an overview of the various types of networking and collaborative sites available to scientists and ways to expose your scientific activities online. Many of these can ultimately contribute to the developing measures of you as a scientist as identified in the new world of alternative metrics. Participating offers a great opportunity to develop a scientific profile within the community and may ultimately be very beneficial, especially to scientists early in their career.

No Comments

ACS Boston: Value of the mediawiki platform for providing content to the chemistry community

My talk at ACS Boston: Value of the mediawiki platform for providing content to the chemistry community

At this time, and in a culture where online access is now an imperative, Wikipedia has become the definitive encyclopedia. In terms of its support for chemistry it is rich in many encyclopedic pages including named reactions, chemical and drug pages, articles about chemists, and many other forms of chemistry related information. Wikipedia is hosted on Mediawiki, an open source platform that can be utilized by anybody as the basis of their own hosted content collection. Mediawiki has been used as a collaborative environment by a number of chemists to create As a general contribution to the community Mediawiki has been used to create a number of resources that have become very popular with the chemistry community. These include VIPEr to support inorganic chemistry, ChemWiki as an online textbook and other educational resources and a Chemical Information Wikibook. Mediawiki has also been used by the author to host open source collections of data including scientists, scientific databases and mobile apps for science: the ScientistsDB, SciDBs and SciMobileApps wikis. This presentation will provide an overview of some of the chemistry resources that presently exist and celebrate the major contributions that Wikipedia and Mediawiki have made to the collaborative dissemination of chemistry.

No Comments

ACS Boston: The driving needs for analytical data exchange standards and the potential impacts on the chemical sciences

This presentation was given at the ACS Boston meeting with the following abstract

Analytical science underpins so many different types of chemistry that it is clearly indispensable. Nuclear Magnetic Resonance and infrared spectroscopy, mass spectrometry and chromatography, and a myriad of other forms of analytical science are easily available to scientists today, commonly in open access walk up labs. While instrumentation is now compact and highly flexible, and the controlling software is both powerful and easy to use, significant challenges remain in terms of the management and integration of various forms of analytical data and, more importantly, the exchange of data between scientists. In general the reporting of data in peer-reviewed journals is limited to electronic supplementary information in the form of PDF files or, occasionally in the form of webpages. Many of the strengths in analytical data resides in the ability to database diverse data types and interrogate later performing searches based on metadata, spectral features and related chemical structure information. The need for file format export and conversions from binary file formats associated with the majority of analytical instrumentation remains a major objective in the field. While file formats such as JCAMP and NetCDF have enabled data exchange for a number of years the requirement for more advanced formats (such as AnIML and mzML) has continued.  This presentation will review existing activities in the development of exchangeable formats and progress in utilizing existing formats for the delivery of reusable analytical data to the community.

No Comments

Last day at the Royal Society of Chemistry – So long and thanks for all the spuds

Today is my last day of employment for the Royal Society of Chemistry. It will be almost six years since I joined RSC when ChemSpider was acquired. While ChemSpider was initially a “hobby project” and attempt to create a disruption in terms of access to chemistry data, crowdsourced contribution and data validation, it has gone from strength to strength and now serves ca. 40,000 unique users a day from around the world. It won three awards in the first few months that we joined RSC and was catalytic in RSC winning three grants to allow us to participate in the Open PHACTS project, the PharmaSea project and become the host of the UK National Chemical Database Service. Based on the feedback I have received over the years ChemSpider is much-loved and appreciated as a contribution to the scientific community and is recognized as one of the key players in the free chemistry resources arena. I am proud to have been associated with it.

We also got to set up the ChemSpider SyntheticPages micropublishing site and tried to get the community sharing syntheses that would likely not make it into mainstream papers but were still of value to science.

During my six years at RSC I have been involved with many discussions regarding the following areas of work, study and research and how they would benefit publishing, the society and, of course, the chemistry community at large. The list includes, in particularly random order:

  • Chemistry databases – both commercial and free- and how to best mesh, commercialize and license data
  • Data quality in publications and databases and development of tools for data validation
  • Open Data, Open Access and Open Notebook Science
  • Text-mining of the RSC archive to extract & mark up compounds, reactions, property data and analytical data.
  • The potential of semantic web applications to scientific publishing
  • Encouraging the use of Open Identifiers – especially ORCID and InChI
  • The future of Micropublishing in the chemical sciences
  • Analytical data and building an open spectral database for the community
  • Social networking approaches to build online profiles – especially for young scientists

There are many, many more things of course but these are the big ones and, for me, bring clarity to what my interests are – chemistry data and making it available to the appropriate communities. It is with this in mind that I am excited to join the Environmental Protection Agency next week in the National Center of Computational Toxicology.

With every move forward into a new job we leave behind our old one. And I leave RSC with some sadness that I am leaving and excitement for the new opportunities. I have had the chance to work with so many good people at RSC, to engage with collaborators such as ACD/Labs, Mestre, NextMove, EBI, ChemAxon, Accelrys (as they were then), iChemLabs, Dotmatics and on and on. Apologies if you are not named but the list is very long. Thanks to everyone for your support, encouragement and opportunities to engage. It has been a blast.

And for everyone at RSC who catered to my strange diet of potatoes only…so long, and thanks for all the spuds.

1 Comment

Beyond the Paper CV (or how to build an online profile as a scientist)

Beyond the Paper CV (or how to build an online profile as a scientist)

This presentation was given at the UKICRS meeting (http://www.ukicrs.org/2015-symposium.html) on April 16th 2015 at the University of Nottingham. This presentation was in a workshop and focused on trying to inform attendees in the postgraduate phases of their careers how to use online tools to start building a reputation and profile in their field. It was good to get positive feedback from some of the attendees. Generally the comments were in regards to the number of different online tools they could use that I highlighted as well as them getting an understanding that they must take responsibility for their reputation and do it soon…there are benefits to starting early!

No Comments

Our dire need to mandate data standards and expectations for scientific publishing

This is a presentation that I delivered at the ACS Division of Chemical Information meeting regarding “Reproducibility, Reporting, Sharing & Plagiarism” at ACS Denver on 23rd March 2015.

I took the opportunity to remove my hat that has me be the VP of Strategic Development at RSC, and a member of the cheminformatics group that built ChemSpider and works on other RSC projects related to it. Instead I presented on how a LACK OF MANDATES from publishers on me in terms of submission of data accompanying articles I am involved with writing is actually weakening my scientific record as data is not getting shared in the most useful forms possible to the benefit of the community. I think there would be benefits for publishers to start pushing me for MORE data, in fairly general standards, and allowing me (and others) to download the data in the form of molecules (and collections), spectral data, CSV files etc.

 

No Comments

Providing Access to a Million NMR Spectra via the web

This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CHED Division symposium

Providing Access to a Million NMR Spectra via the web

Antony Williams, Alexey Pshenichnov, Peter Corbett, Daniel Lowe, Carlos Coba

Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s Learn Chemistry. These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.

 

No Comments

Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact

This presentation was given at the ACS Denver meeting on March 22nd 2015 in a CINF Division symposium

Give me kudos for taking responsibility for self-marketing my scientific publications and increase impact.

Antony Williams, Will Russell, Melinda Kenneway and Louise Peck

The authoring of a scientific publication can represent the culmination of many tens if not 100s of hours of data collection and analysis. The authoring and peer-review process itself often represents a major undertaking in terms of assembling the publication and passing through review. Considering the amount of work invested in the production of a scientific article it is therefore quite surprising that authors, post-publication, invest very little effort in communicating the value and potential impact of their article to the community. Social networking has clearly demonstrated the ability to self-market and drive attention. At the same time, the increasing volume of literature (over a million new articles are published every year), requires authors to take on a more direct role in ensuring their work gets read and cited. This requirement may grow with the emergence of a range of metrics at the article level, shifting attention away from where a researcher publishes to the performance of their individual articles. Therefore, a separate platform to facilitate social networking and other discovery tools to communicate the value of published science to the community would be of value. In parallel the possibility to enhance an article by linking to additional information (presentations, videos, blog posts etc) allows for enrichment of the article post-publication, a capability not available via the publishers platform. This presentation will provide a personal overview of the experiences of using the Kudos Platform and how it ultimately benefits my ability to communicate an integrated view of my research to the community.

 

No Comments

PITTCON poster: Dealing with the complex challenge of managing diverse analytical chemistry data online

This is a talk I presented at Pittcon on Wednesday March 13th, 2015

Dealing with the complex challenge of managing diverse analytical chemistry data online

The Royal Society of Chemistry provides open access to data associated with tens of millions of chemical compounds. The richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process delivering a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on the challenges of managing “Big Data” for chemists around the world and providing access to tools for structure dereplication, spectral database searching and the crowdsourcing of the worlds’ largest spectral database.

 

No Comments

PITTCON Poster: Using an online database of chemical compounds for the purpose of structure identification

This is a poster I presented at Pittcon on Wednesday March 9th, 2015

Using an online database of chemical compounds for the purpose of structure identification

Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.

 

No Comments