RSS

One Day I’ll Have Lunch with Egon Willighagen Too…

12 Oct

Ian Mulvany recently posted on a lunch meeting with Egon Willighagen over at Nascent. Egon’s a member of our Advisory Group, is very supportive of our efforts and provides great feedback to questions. We havent yet met…but I look forward to sharing lunch with him one day…

I wish I’d been at that lunch as I’d have some comments to add in. I’ve extracted from Ian’s post below and italicized his words then commented below.

“One solution to marking up molecules is to use an InChi (an IUPAC International Chemical Identifier). These have been championed by Peter Murray Rust and there is an extensive InChi FAQ available. The short story is that an InCHi is a character string which uniquely describes a chemical substance. From any chemical structure you can generate an InChi.”

AW> Peter has been a great advocate and champion for InChI and has definitely evangelized the value. But we should not forget those who have pushed the development and executed on delivering it. Specifically, Steve Heller, Steve Stein, Dmitrii Tchekhovskoi (all associated with NIST) and Alan McNaught (associated with IUPAC). The InChI was originally called the IUPAC-NIST Chemical Identifier. I’ve spoken previously about heroes and these people are truly the heroes of InChi. The rest of us get to use it, talk about and celebrate it…they had the vision AND executed on it.

“Peter has a writeup on using inCHi in blogs, and if every chemical that appeared everywhere was somehow marked up with it’s InChi, or the article referring to it tagged with them then the findability problem would be solved by simple string searching.”

AW> Yes, this is true. BUT it is limited. And people don’t appear to be talking about the limitations. Chemists don’t necessarily want to search only on an exact structure (and don’t me get started on all of the various layers that can be layered onto an InChIString – stereo, fixed hydrogens etc). They may want to search on substructure and similarity of structure and InChIs are going to have to be aggregated to allow this … I have blogged about an approach and Egon could help get us there!

Egon suggested as a solution that journals should require papers dealing with chemicals to include InChis. He said that every tool for drawing chemicals (standard issue for anyone writing a paper on the subject) can now output the InChi with the click of a button <…> you are Nature, you can make authors do anything in order to get a paper published so why not get them to do x. Well, for a start, that’s an editorial decision,<…>

Journals are naturally shy of any step that can delay the publication time of an article, and so I am also skeptical that we would see such obligatory requirements. Better, I think, to have this step as a voluntary one. Practically all journals allow supplementary information and I am sure all of them would accept InChi as supplementary information.”

I agree with Egon…I’ve written almost a dozen peer-reviewed articles this year. The insturctions for authors demand systematic nomenclature and the authors are responsible for it. Demand InChI. Alternatively the majority of papers have structures embedded as OLE compatible objects. Develop a tool (not difficult) to generate InChIs on them. By the way, the InChIs COULD be embedded directly inside a PDF (I managed a product that generated PDF files that were STRUCTURE-SEARCHABLE! as well as generated images that were structure searchable. ) Yes, there is work to be done BUT it can be done. The challenge, I believe, is to get the primary societies to throw down the gauntlet. RSC are already using InChIs in Project prospect. if Chemical Abstracts Service were to utilize and index InChIs the American Chemical Society might be very interested in requiring InChIs for their manuscripts, whether directly embedded in the documents or as supplementary information. Rich Docherty over at TotallySynthetic has started tagging his posts with InChIKeys…not InChIStrings (I’ve talked about the value of this here and here)

“So what can we do now to help making connections between papers and molecules? Peter Corbett, who works with Peter Murray Rust, is working on automated methods of getting computers to read chemistry papers and output semantic markup of them. “

AW> Over at ChemSpider we are working with Will Griffiths who developed ChemRefer . We have already extracted 10s of thousands of chemical names and will be linking them up to ChemSpider structures to enable Open Access papers to be structure/substructure searchable. However, we’ve hit a bit of a hurdle…more details on this will follow shortly but we have been asked to remove thousands of articles indexed according to what we believe is a standard search engine policy from the ChemRefer index. During our conversation today with the publisher the conversion of chemical names to chemical structures to provide a structure searchable index of the articles was deemed to be “re-purposing” of the Open Access articles and is NOT allowable. Peter Corbett and Peter Murray Rust are engaged in similar activities so will likely run into the same challenges. If they manage to get around this issue with this and other publishers then they will be working in a “permissive” role where they will need to get permission from publishers to perform semantic markup. Their semantic markup is also “re-purposing”. The “permissive challenge” is far away from Peter’s stance in terms of Open Data for all.

“Egon has now created rdf pages for molecules on openmolecules.net. These pages use the InChi in their structure, and now each molecule had it’s own web page. “

AW> We are now working with Egon to RDF our own ChemSpider pages. Watch this space…

“Egon’s pages check Connotea, and pull from Connotea co-tags of InChi tags (Here is a short description of this). If we work on this a bit more we should be able to set up a system where if you tag a paper with an InChi, that paper could appear on Egon’s pages. “

AW> Not only Egon’s pages…we will index directly into ChemSpider also. The molecules will become part of a close to 20 million structure index including analytical data. It is one big web of chemistry, it is all coming together now, and Egon is a good guy to have lunch with. Wish I was there….

 

About tony

Antony (Tony) J. Williams received his BSc in 1985 from the University of Liverpool (UK) and PhD in 1988 from the University of London (UK). His PhD research interests were in studying the effects of high pressure on molecular motions within lubricant related systems using Nuclear Magnetic Resonance. He moved to Ottawa, Canada to work for the National Research Council performing fundamental research on the electron paramagnetic resonance of radicals trapped in single crystals. Following his postdoctoral position he became the NMR Facility Manager for Ottawa University. Tony joined the Eastman Kodak Company in Rochester, New York as their NMR Technology Leader. He led the laboratory to develop quality control across multiple spectroscopy labs and helped establish walk-up laboratories providing NMR, LC-MS and other forms of spectroscopy to hundreds of chemists across multiple sites. This included the delivery of spectroscopic data to the desktop, automated processing and his initial interests in computer-assisted structure elucidation (CASE) systems. He also worked with a team to develop the worlds’ first web-based LIMS system, WIMS, capable of allowing chemical structure searching and spectral display. With his developing cheminformatic skills and passion for data management he left corporate America to join a small start-up company working out of Toronto, Canada. He joined ACD/Labs as their NMR Product Manager and various roles, including Chief Science Officer, during his 10 years with the company. His responsibilities included managing over 50 products at one time prior to developing a product management team, managing sales, marketing, technical support and technical services. ACD/Labs was one of Canada’s Fast 50 Tech Companies, and Forbes Fast 500 companies in 2001. His primary passions during his tenure with ACD/Labs was the continued adoption of web-based technologies and developing automated structure verification and elucidation platforms. While at ACD/Labs he suggested the possibility of developing a public resource for chemists attempting to integrate internet available chemical data. He finally pursued this vision with some close friends as a hobby project in the evenings and the result was the ChemSpider database (www.chemspider.com). Even while running out of a basement on hand built servers the website developed a large community following that eventually culminated in the acquisition of the website by the Royal Society of Chemistry (RSC) based in Cambridge, United Kingdom. Tony joined the organization, together with some of the other ChemSpider team, and became their Vice President of Strategic Development. At RSC he continued to develop cheminformatics tools, specifically ChemSpider, and was the technical lead for the chemistry aspects of the Open PHACTS project (http://www.openphacts.org), a project focused on the delivery of open data, open source and open systems to support the pharmaceutical sciences. He was also the technical lead for the UK National Chemical Database Service (http://cds.rsc.org/) and the RSC lead for the PharmaSea project (http://www.pharma-sea.eu/) attempting to identify novel natural products from the ocean. He left RSC in 2015 to become a Computational Chemist in the National Center of Computational Toxicology at the Environmental Protection Agency where he is bringing his skills to bear working with a team on the delivery of a new software architecture for the management and delivery of data, algorithms and visualization tools. The “Chemistry Dashboard” was released on April 1st, no fooling, at https://comptox.epa.gov, and provides access to over 700,000 chemicals, experimental and predicted properties and a developing link network to support the environmental sciences. Tony remains passionate about computer-assisted structure elucidation and verification approaches and continues to publish in this area. He is also passionate about teaching scientists to benefit from the developing array of social networking tools for scientists and is known as the ChemConnector on the networks. Over the years he has had adjunct roles at a number of institutions and presently enjoys working with scientists at both UNC Chapel Hill and NC State University. He is widely published with over 200 papers and book chapters and was the recipient of the Jim Gray Award for eScience in 2012. In 2016 he was awarded the North Carolina ACS Distinguished Speaker Award.
Leave a comment

Posted by on October 12, 2007 in Uncategorized

 

0 Responses to One Day I’ll Have Lunch with Egon Willighagen Too…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
Stop SOPA