RSS

Category Archives: InChI

Using an online database of chemical compounds for the purpose of structure identification #ACSsanfran

Using an online database of chemical compounds for the purpose of structure identification

Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.

 

How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry #ACSsanfran

This is my presentation at the ACS San Francisco Fall Meeting on August 10th 2014

How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry

The Royal Society of Chemistry hosts a growing collection of online chemistry content. For much of our work the InChI identifier is an important component underpinning our projects. This enables the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a platform encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.

 

Data Mining Dissertations and Adventures and Experiences in the World of Chemistry

Data Mining Dissertations and Adventures and Experiences in the World of Chemistry

This presentation was given at the CLIR/DLF Postdoctoral Fellowship Summer Seminar at Bryn Mawr college in Pennsylvania on July 29th 2014. The intention was to communicate what we are doing in the fields of text and data mining in the domain of chemistry and specifically around mining the RSC archive publication and chemistry dissertations and theses. How would these experiences map over to the humanities?

 

Tags: ,

The Importance of the InChI Identifier as a Foundation Technology for eScience Platforms at the Royal Society of Chemistry

This is a presentation I gave today at Bio-IT 2014 here in Boston. I was in the company of a number of my favorite people to be o the agenda with… Steve Heller, Steve Boyer, Evan Bolton and Chris Southan.

The Importance of the InChI Identifier as a Foundation Technology for eScience Platforms at the Royal Society of Chemistry

The Royal Society of Chemistry hosts one of the largest online chemistry databases containing almost 30 million unique chemical structures. The database, ChemSpider, provides the underpinning for a series of eScience projects allowing for the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it specifically in the ChemSpider project to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a Global Chemistry Network encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.

 

The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms

This is my seventh and LAST talk at the ACS Meeting in Indianapolis:

The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms

The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.

 

An Invitation to Contribute an Article to an InChI Thematic Issue

The International Chemical Identifier has been a tremendous success in such a small amount of time. It really should be celebrated for the contribution it is making to cheminformatics and, specifically, for the way it has enabled chemistry connectivity on the internet. Earlier this year we hosted a big audience at an ACS-CINF meeting in San Diego dedicated to celebrating the impact of the InChI and defining some of the path forward for the future. My talk on InChI is below.

 

Certainly the structure standard has significant momentum and is not slowing down either in development or impact.

Following on from the meeting I would like to announce a Thematic Issue in the Journal of Cheminformatics for which I will be guest editor. The Journal of Cheminformatics, under the editorial guidance of David Wild and Christoph Steinbeck, is one of our domains top cheminformatics journal and certainly the top Open Access journal of its type. I already have 10 authors/groups who have committed to contributing a paper and hopefully all will come in. However, this is an Open Call for authors interested in publishing a paper in this thematic issue to contact me to discuss. There will be appropriate subsidies as necessary to support authors and the InChI Trust has already committed sponsorship funding to support some articles. Please do contribute if you have been using InChI in your projects and feel that you can contribute an article. Thanks

 
2 Comments

Posted by on June 22, 2012 in InChI

 

Invitation to Submit Flash Talk to ACS San Diego InChI Symposium

My good friend Alex Tropsha and I will be hosting an InChI Symposium at the ACS Meeting in San Diego in Spring 2012 and have issued a number of invitations for speakers already. We are particularly interested in also having a number of “flash presentations” of a few minutes in length with potential topics of:

1) How you use InChI in your lab/platforms

2) Novel adaptations in using InChI

3) InChI Collisions and how you found them

4) The need for InChI Extensions

5) From feet to InChIs…

And so on. Very flexible and make it your own. If you are interested in participating please send me an email to tony27587ATgmailDOTcom.

 
Leave a comment

Posted by on September 6, 2011 in InChI

 

An InChIkey Collision is Discovered and NOT Based on Stereochemistry

InChI Strings and InChIKeys are very much the backbone of ChemSpider and have quickly become a way by which online databases are being connected online. The InChIKey is a hash of the InChiString and when the hash was adopted it was suggested that the likelihood that there would be a collision was very small, the estimate being, as quoted from the official InChI site:

“An example of InChI with its InChKey equivalent is shown below. There is a finite, but very small probability of finding two structures with the same InChIKey. For duplication of only the first block of 14 characters this is 1.3% in 109, equivalent to a single collision in one of 75 databases of 109 compounds each.”

At a previous ACS Meeting Prof Jonathan Goodman from University of Cambridge announced that he had identified a collision. The collision was for two isomers of spongistatin, a rather complex chemical structure with many stereocenters.

Jonathan has “done it again”…what a troublemaker he is (in a supremely gentlemanly way!). I was fortunate enough to receive the news about this collision from him just as I was getting on the flight from ACS Denver to home tonight and asked his permission to blog it as it is both exciting and, I believe, quite surprising news. Why? In this case the collision is for two distinctly different chemicals with totally different formulae and with NO stereochemistry! Very surprising!

As you can see in the figure below the two chemical compounds are simply long branched alkyl chains, one an alcohol and one a ketone.

In case Jonathan’s software tool that he was using to connect to the InChI generation software was doing something untoward with the molfile I confirmed the observation myself by drawing the structures in ACD/ChemSketch and generating the InChIKeys there. And, sure enough…I see exactly the same Standard InChIKeys for both molecules as shown in the movie below. VERY interesting!

 

 
13 Comments

Posted by on September 1, 2011 in General Communications, InChI, InChI

 

Tags: , , ,

Your Opinion WANTED on how should the structure of Tegaserod be drawn

Those of you who watch this blog know that many of the discussions are about chemical structures, accurate representations on databases and how to “correctly” communicate chemical structures/compounds for the users. So, this is an OPINION question…it’s not an “I have an answer” blog post.

So, Tegaserod has, according, the Dailymed here the structure below:

It can be envisaged as having a trans-orientation but the name on DailyMed doesn’t indicate trans….”3-(5-methoxy-1H-indol-3-ylmethylene)-N-pentylcarbazimidamide”

On Wikipedia here we see the structure below and a systematic name supporting a trans-orientation.

Now there are actually a number of ways to represent Tegaserod and, since there’s no stereochemistry to complicate the molecule, and we are interested in the skeleton per se, we can search on the first part of the InChI on a database like ChemSpider. A search on IKBKZGMPCYNSLU as the first part of the InChI for the structure gives 3 hits. Take a look.I don’t see any real reasons to show the crossbonds for the NH but so be it.

Now, consider that the three hits are E-, Z- and crossbond orientations, and their InChIKeys are as shown below, the results set is indeed expected. My question, based on the structures that you see for Tegaserod, would you prefer to see the compound drawn and how would you expect it to be held in the database. Think about what you would expect to happen in terms of a search. If you drew a cis-form should it retrieve cis and crossed? If you drew crossed should it retrieve cis and trans? etc. Remember, it’s an opinion so no answer is wrong…

 
4 Comments

Posted by on February 18, 2011 in Computing, Data Quality, InChI

 

The Day I Noticed that CAS SciFinder Now Supports #InChI

Today. Today is the day I noticed that CAS Scifinder now supports InChI! Wow. Now, that may not be big news for many of you but for those of us who have supported InChI, both vocally AND in action around it, this is big news. InChI is not perfect and has areas to develop in (some later posts will cover this) but it is ALREADY extremely enabling. (For an example about InChI issues…but focused primarily on how people DRAW structures that they feed to InChI algorithms see slides 75-84 on this presentation http://tinyurl.com/4hhgqbd)

It is helping to link databases, enable web searching and improve communication between cheminformatics applications. ACS and CAS have been quite late in providing support for InChI and today I noticed it has arrived. This is great news for InChI and very much a blessing for InChI as a standard for interchange. Now, if we can get StdInChIKeys layered onto all ACS publications then the need for an InChI Resolver will be increased….

 
Leave a comment

Posted by on February 8, 2011 in Community Building, Data Quality, InChI

 
 
Stop SOPA