Archive for category Community Building

Data enhancing the Royal Society of Chemistry publication archive

This is a presentation I gave at the ACS Dallas meeting on March 19th 2014

Data enhancing the Royal Society of Chemistry publication archive

The Royal Society of Chemistry has an archive of hundreds of thousands of published articles containing various types of chemistry related data – compounds, reactions, property data, spectral data etc. RSC has a vision of extracting as much of these data as possible and providing access via ChemSpider and its related projects. To this end we have applied a combination of text-mining extraction, image conversion and chemical validation and standardization approaches. The outcome of this project will result in new chemistry related data being added to our chemical and reaction databases and in the ability to more tightly couple web-based versions of the articles with these extracted data. The ability to search across the archive will be enhanced as a result. This presentation will report on our progress in this data extraction project and discuss how we will ultimately use similar approaches in our publishing pipeline to enhance article markup for new publications.


No Comments

The UK National Chemical Database Service as an integration of commercial and public chemistry services to support chemists in the United Kingdom

This is a presentation I gave at the ACS National Meeting in Dallas on Wednesday 19th March 2014

The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom

At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.


No Comments

The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms

This is my seventh and LAST talk at the ACS Meeting in Indianapolis:

The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms

The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.

No Comments

The Social Profile of a Chemist Online

This is my fifth talk at the ACS Indianapolis Conference:

The Social Profile of a Chemist Online – The Potential Profits of Participation

Unless a scientist is limited by their employer from exposing their scientific activities through publications and presentations, their future impact, whether expected to be at a bench, in front of an instrument or surrounded by robotics, will largely be represented online through their published works, their citation profile and other forms of recognition of their work by their peers. Search engines are already harvesting information about a scientist and aggregating into profiles such as those offered by Google Scholar Citations and Microsoft Academic Search. Rather than be limited to the online representation provided by such services students are encouraged to participate in the creation of their online profile and architect the representation of themselves online to as large a degree as possible to represent themselves to future employers and collaborators. This presentation will give an overview of potential approaches to participating in development of their online persona.



Personal experiences in participating in the expanding social networks for science

This is the third presentation I gave at the ACS Meeting in Indianapolis:

Personal experiences in participating in the expanding social networks for science

The number of social networking sites available to scientists continues to grow. We are being indexed and exposed on the internet via our publications, presentations and data. We have many ways to contribute, annotate and curate, many of them as part of a growing crowdsourcing network. As one of the founders of the online ChemSpider database I was drawn into the world of social networking to participate in the discussions that were underway regarding our developing resource. As a result of my experiences in blogging, and as a result of developing collaborations and engagement with a large community of scientists, I have become very immersed in the expanding social networks for science. This presentation will provide an overview of the various types of networking and collaborative sites available to scientists and ways that I expose my scientific activities online. Many of these activities will ultimately contribute to the developing measures of me as a scientist as identified in the new world of alternative metrics.

No Comments

Accessing chemical health and safety data online using Royal Society of Chemistry resources

This is the second presentation I gave at the ACS Meeting in Indianapolis

Accessing chemical health and safety data online using Royal Society of Chemistry resources

The internet has opened up access to large amounts of chemistry related data that can be harvested and assembled into rich resources of value to chemists. The Royal Society of Chemistry’s ChemSpider database has assembled an electronic collection of over 28 million chemicals from over 400 data sources and some of the assembled data is certainly of value to those searching for chemical health and safety information. Since ChemSpider is a text and structure searchable database chemists are able to find relevant information using both of their general search approaches. This presentation will provide an overview of the types of chemical health and safety data and information made available via ChemSpider and discuss how the data are sourced, aggregated and validated. We will examine how the data can be made available via mobile devices and examine the issue of data quality and its potential impacts on such a database.


No Comments

The future of scientific information & communication presented at the SUNY Potsdam Academic Festival

This is a LONG presentation….I talk about the “It’s All About Me” attitude that can positively feed science….we want to share OUR science, we want people to know about our opinions, our activities, our collaborators, we want to get funding, recognition and attribution. And why not…it can all be to the benefit of science.

This presentation was given at the SUNY Potsdam Academic Festival

The future of scientific information & communication

Our access to scientific information has changed in ways that were hardly imagined even by the early pioneers of the internet. The immense quantities of data and the array of tools available to search and analyze online content continues to expand while the pace of change does not appear to be slowing. While scientists now have access to the enormous capacities and capability of the internet the vast majority of scientific communication continues to be through peer-reviewed scientific journals. The measure of a scientist’s contribution is primarily represented by their publication profile and the citations to their published works and offers an incomplete view of their activities. However, we are at the beginning of a new revolution where the ability to communicate offers the opportunity to embrace new forms of publishing and where scientific participation and influence will be measured in new ways. This presentation will provide an overview of our new generation of “openness” in which open source, open standards, open access and open data are proliferating. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.

, , , ,

1 Comment

Olympicene Now Has a Wikipedia Article

The story about Olympicene was released earlier this week to great fanfare online. I discussed the details here. There has been so much press with comments made online on the websites of  Popular Science, Scientific American, BBC News, the Huffington Post and many others that I wondered whether it would be appropriate to suggest an article get written for Wikipedia.

Now, I am VERY concerned with notability on Wikipedia, as evidenced by my post here about notability. I think Olympicene is “notable”. I am also concerned with being flagged with conflict of interest as I was involved with the Olympicene project. My intention was to ask the community to participate in writing the article. However, after checking Wikipedia I was happy to find that the community already got to it. In two days 9 authors had already worked on the article! I checked the View History logs…I don’t know who ANY of them are….so I am not in conflict of interest there either by asking someone to write it!

The Wikipedia article on Olympicene is here. It would be ideal to get the color image up there for dramatic effect as well as the original concept details when Olympicene was introduced, as well as at least some representatives of the synthetic path on ChemSpider SyntheticPages. I can’t add them…I’ll get flagged probably. I’d also link to the ScientistsDB article about Graham Richards as it is much richer than the one on Wikipedia.

Either way…from release to Wikipedia same day and nine authors in two days. Now that’s community collaboration!

No Comments

The Story of Olympicene from Concept to Completion

The story of Olympicene, and our intention to try and get it synthesized and analyzed, was first reported in August 2011 here. The original conversation was between Prof Graham Richards and I over a drink in Belgium at the RSC Editors Symposium in March 2010. The concept of having someone synthesize a small organic molecule that would be a molecular representation of a famous symbol of sport was a fascinating challenge. And, always one for a challenge, it was one that was pursued with great gusto!

Since we had started the ChemSpider SyntheticPages (CSSP) platform recently I thought it was appropriate to kick off a grand vision discussion with Peter Scott, one of the editors of CSSP. My original idea that I bounced off of Peter was a big one…an international competition exposed to the chemistry community. Encourage chemistry labs around the world to submit their step-by-step syntheses to CSSP. We would be able to collect and expose all of this work to the entire chemistry community. We would set up a voting scheme for the community to give their input on what was the most elegant synthesis, the greenest, what had the best analytical data, what had the best write up. Not all categories were detailed at that time and would come later but the concept of bronze, silver and gold medal winners in an international chemistry competition made sense. We were really excited by the possibilities but for many reasons (read that as many distractions) we rolled the announcement out as a smaller announcement and encouraged participation as best as we could with a small engagement profile via this blog. It did seem to garner a lot of attention but as is common with such projects the participation was not as high as we expected. Nevertheless one lab did step up to participate in the project, the lab of David Fox Group at the University of Warwick. David is a colleague of Peter Scott’s…small world…

David had one of his students pursue the synthesis, not only because the olympicene molecule might be an elegant piece of synthetic work, but also because some of the envisaged properties could well be of value (more on that later!). Anish started publishing his syntheses to CSSP in November of last year as listed here. You can see the Olympicene compound coming together step by step and yes, the final step is not yet reported! Once the compound was made then the possibilities of having it analyzed seemed rather interesting, especially having seen the work reported by IBM in 2009 regarding the single molecule imaging of pentacene. Also, I had followed the work of Marcel Jaspars, who I had known during my time working at ACD/Labs when I was working on Computer-Assisted Structure Elucidation [1,2]. Marcel had recently worked on an NMR and microscopy imaging project to confirm a chemical compound structure. Again, small world. I asked Marcel for an intro to the researchers at IBM and we started a dialogue. Researchers at University of Warwick had already applied Scanning Tunnelling Microscopy (Dr Giovanni Costantini and Ben Moreton at Warwick) and they then connected with Leo Gross with the idea of using the noncontact atomic force microscopy approach.

Within a fairly short period of time IBM had performed the very elegant work of imaging olympicene…just one of the images is shown below but there are others shown on the Flickr account.

A single olympicene molecule is just 1.2 nanometres in width, about 100,000 times thinner than a human hair. This is beautiful! For whatever reason it looks like a molecule with a smile at the success of the work too!

The story of the work is described in this video below.

The work is not over yet! There is a research paper to come from the University of Warwick and IBM Research labs as there is definitely unique science that has come out of this work and definitely needs to be reported. That molecule, as it were, is “NOT just a pretty face”. We will submit all the appropriate images and available analytical data onto ChemSpider and CSSP as time allows.

For now I simply smile at the story of a concept discussion between Graham and I that was taken into the hands of superb scientists and brought to fruition. Congratulations to ALL of those who worked on the project in David Fox’s and Leo Gross’s labs. Thanks to the marketing people at IBM, RSC and Warwick for bringing together all of the materials in a tight time frame to tell the story. My thanks to my colleagues at RSC who believed in the potential of this project and especially to Peter Scott for seeing the potential and willingly participating! This project is a great example of international collaboration and pushing science to its extremes. It was a pleasure to be involved if only at a concept level and HOPEFULLY I will get to meet the scientists who did the work sometime!

, , ,


Staying Informed About my Citations Using Google Scholar Citations

I think the Google Scholar Citations resource is excellent. I was one of the fortunate ones that managed to get onto the system early and I signed on immediately and used it to aggregate my papers very easily and quickly as represented here. One of my favorite aspects of the system is how it keeps me informed, by emails direct to my inbox, that other papers are referencing papers for which I am an author. Today the email that hit me listed four such papers. A free service, regular updates and, as best as I can tell, working as advertised for me at least.

Scholar Alert: New citations to my articles

The Chemical Ecology of Soil Organic Matter Molecular Constituents

MJ Simpson… – Journal of Chemical Ecology, 2012

Abstract Soil organic matter (OM) contains vast stores of carbon, and directly supports
microbial, plant, and animal life by retaining essential nutrients and water in the soil. Soil OM
plays important roles in biological, chemical, and physical processes within the soil, and


Bioinformatics and variability in drug response: a protein structural perspective

JL Lahti, GW Tang, E Capriotti, T Liu… – Journal of The Royal Society …, 2012

Abstract Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of


From theory of spectra to standardless analysis of molecular objects

LA Gribov… – Journal of Analytical Chemistry, 2012

Abstract The authors discuss the methodology of quantitative analysis of pure substances
and mixtures by optical spectra (IR, Raman, UV, etc.) without using samples of standard
composition (standardless molecular spectral analysis). An algorithm of quantitative


Aligning chemical structure diagrams with local search

M Hilbig… – Journal of Cheminformatics, 2012

Chemists working in biomolecular application projects are usually looking at many related
molecules (eg results of a virtual screening run, lead series development or library design).
For a convenient visual analysis of this data it is essential that differences between


This Google Scholar Alert is brought to you by Google.

, ,

No Comments