I gave a presentation at the ICIC 2013 meeting in Vienna focused on the “Big data challenges associated with building a national data repository for chemistry“. The Slideshare presentation is shown below.
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types ssociated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.