Archive for category Chemical Database Service
Today is my last day of employment for the Royal Society of Chemistry. It will be almost six years since I joined RSC when ChemSpider was acquired. While ChemSpider was initially a “hobby project” and attempt to create a disruption in terms of access to chemistry data, crowdsourced contribution and data validation, it has gone from strength to strength and now serves ca. 40,000 unique users a day from around the world. It won three awards in the first few months that we joined RSC and was catalytic in RSC winning three grants to allow us to participate in the Open PHACTS project, the PharmaSea project and become the host of the UK National Chemical Database Service. Based on the feedback I have received over the years ChemSpider is much-loved and appreciated as a contribution to the scientific community and is recognized as one of the key players in the free chemistry resources arena. I am proud to have been associated with it.
We also got to set up the ChemSpider SyntheticPages micropublishing site and tried to get the community sharing syntheses that would likely not make it into mainstream papers but were still of value to science.
During my six years at RSC I have been involved with many discussions regarding the following areas of work, study and research and how they would benefit publishing, the society and, of course, the chemistry community at large. The list includes, in particularly random order:
- Chemistry databases – both commercial and free- and how to best mesh, commercialize and license data
- Data quality in publications and databases and development of tools for data validation
- Open Data, Open Access and Open Notebook Science
- Text-mining of the RSC archive to extract & mark up compounds, reactions, property data and analytical data.
- The potential of semantic web applications to scientific publishing
- Encouraging the use of Open Identifiers – especially ORCID and InChI
- The future of Micropublishing in the chemical sciences
- Analytical data and building an open spectral database for the community
- Social networking approaches to build online profiles – especially for young scientists
There are many, many more things of course but these are the big ones and, for me, bring clarity to what my interests are – chemistry data and making it available to the appropriate communities. It is with this in mind that I am excited to join the Environmental Protection Agency next week in the National Center of Computational Toxicology.
With every move forward into a new job we leave behind our old one. And I leave RSC with some sadness that I am leaving and excitement for the new opportunities. I have had the chance to work with so many good people at RSC, to engage with collaborators such as ACD/Labs, Mestre, NextMove, EBI, ChemAxon, Accelrys (as they were then), iChemLabs, Dotmatics and on and on. Apologies if you are not named but the list is very long. Thanks to everyone for your support, encouragement and opportunities to engage. It has been a blast.
And for everyone at RSC who catered to my strange diet of potatoes only…so long, and thanks for all the spuds.
A presentation that I am giving around UK universities in September/October 2014
A chemistry data repository to serve them all
Over the past five years the Royal Society of Chemistry has become world renowned for its public domain compound database that integrates chemical structures with online resources and available data. ChemSpider regularly serves over 50,000 users per day who are seeking chemistry related data. In parallel we have used ChemSpider and available software services to underpin a number of grant-based projects that we have been involved with: Open PHACTS – a semantic web project integrating chemistry and biology data, PharmaSea – seeking out new natural products from the ocean and the National Chemical Database Service for the United Kingdom. We are presently developing a new architecture that will offer broader scope in terms of the types of chemistry data that can be hosted. This presentation will provide an overview of our Cheminformatics activities at RSC, the development of a new architecture for a data repository that will underpin a global chemistry network, and the challenges ahead, as well as our activities in releasing software and data to the chemistry community.
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry #ACSsanfran
This is my presentation at the ACS San Francisco Fall Meeting on August 10th 2014
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry
The Royal Society of Chemistry hosts a growing collection of online chemistry content. For much of our work the InChI identifier is an important component underpinning our projects. This enables the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a platform encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.
The UK National Chemical Database Service as an integration of commercial and public chemistry services to support chemists in the United Kingdom
This is a presentation I gave at the ACS National Meeting in Dallas on Wednesday 19th March 2014
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.
This is a presentation I have at the ACS Meeting in Dallas, Texas on March 17th 2014
Royal Society of Chemistry Activities to Develop a Data Repository for Chemistry-Specific Data
In recent years the Royal Society of Chemistry has become known for our development of freely accessible data platforms including ChemSpider, ChemSpider Reactions and our new chemistry data repository. In order to support drug discovery RSC participates in a number of projects including the Open PHACTS semantic web project, the PharmaSea natural products discovery project and the Open Source Drug Discovery project in collaboration with a team in India. Our most recent developments include extending our efforts to support neglected diseases by the provision of high quality datasets resulting from our curation efforts to support modeling, the delivery of enhanced application programming interfaces to allow open source drug discovery teams to both source and deposit data from our chemistry databases and the provision of a micropublishing platform to report on various aspects of work supporting neglected disease drug discovery. This presentation will review our existing efforts and our plans for extended development.
I gave a presentation at the ICIC 2013 meeting in Vienna focused on the “Big data challenges associated with building a national data repository for chemistry“. The Slideshare presentation is shown below.
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types ssociated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.
The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms
This is my seventh and LAST talk at the ACS Meeting in Indianapolis:
The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms
The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.
The future of scientific information & communication presented at the SUNY Potsdam Academic Festival
This is a LONG presentation….I talk about the “It’s All About Me” attitude that can positively feed science….we want to share OUR science, we want people to know about our opinions, our activities, our collaborators, we want to get funding, recognition and attribution. And why not…it can all be to the benefit of science.
This presentation was given at the SUNY Potsdam Academic Festival
The future of scientific information & communication
Our access to scientific information has changed in ways that were hardly imagined even by the early pioneers of the internet. The immense quantities of data and the array of tools available to search and analyze online content continues to expand while the pace of change does not appear to be slowing. While scientists now have access to the enormous capacities and capability of the internet the vast majority of scientific communication continues to be through peer-reviewed scientific journals. The measure of a scientist’s contribution is primarily represented by their publication profile and the citations to their published works and offers an incomplete view of their activities. However, we are at the beginning of a new revolution where the ability to communicate offers the opportunity to embrace new forms of publishing and where scientific participation and influence will be measured in new ways. This presentation will provide an overview of our new generation of “openness” in which open source, open standards, open access and open data are proliferating. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
Presentation given at ACS New Orleans Spring Meeting
ChemSpider is one of the chemistry community’s primary public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data to many tens of websites and software applications at this point. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of solutions that it helps to enable. We will also discuss some of the future directions for the project that are envisaged and how we intend to continue expanding the impact for the platform.
For whatever reason, at the end of the year I get kind of thoughtful regarding what I have done over the past year and what is coming in the year ahead. I’ll hopefully get some time to review what’s gone past in 2012 but in reality I will likely be spending a lot of my spare time over the holidays working with my colleagues on the Chemical Database Project that RSC has been funded to deliver by EPSRC.
The first page is up already here (http://cds.rsc.org) and declares what we intend to deliver on January 2nd 2013. At that time we will have been working in the initial delivery for about 5 weeks only and will, in my experience, have gone through some of the best team experiences I have gone through in many a year. We will have negotiated prices and reviewed contracts, integrated a series of commercial databases and some of our own resources, built out an infrastructure to host the system and navigated many challenges around timeliness, delivery of heterogeneous software platforms and databases, debugging of many tens of thousands of lines of code and working across multiple time zones within our team. I have the privilege of working with some great people committed to getting it done!
The vision of the project, as we see it now, is outlined at a very basic level on the Chemical Database Service blog so I will not reiterate it here. Suffice to say we have outlined a project and future for the CDS that was appropriate to have us be awarded the tender. It includes not only the integration of a series of commercial databases and prediction services but also the development of a data repository for UK chemists that will allow embargo-based storage of user-defined licensed data at a personal, group and institutional level. The repository, as it is built out, will include storage of chemicals, syntheses, analytical data, property data etc. The project is still being scoped out and will engage the community of users and collaborators in defining how it should be implemented and the priority of development to deliver greatest value as the project progresses. We have five years of development ahead. It’s going to be challenging, entertaining, motivating and important. It’s going to be a tiring holiday season to meet the January 2nd deadline but that is just the start. Next year the fun really starts!