I’ve been involved with Nomenclature in one way or another for well over a decade. While I’m an NMR spectroscopist by training (as evidenced by the >100 publications in this area) during my decade long tenure at ACD/Labs I learned a lot about: PhysChem parameters and their prediction, systematic nomenclature, structure drawing and databasing, chemometrics, LC-MS data analysis and so on. As the product manager for many of these products I was dropped in the deep end. Nomeclature was something I really enjoyed. While I am not a nomenclature specialist in terms of a “generate a perfect systematic name for Taxol level” I have a decade of experience working with nomenclature software for both generation of names from structures and the generation of structures from names. Having worked with 100s of customers and their needs I’ve dealt with a lot of beliefs around nomenclature and perceptions of how to use the tools.
Having just spent the week at Bio-IT and having been engaged with a number of conversations about Name to Structure conversion, it became clear that one of the prevailing beliefs for users of name to structure conversion packages is that spaces in systematic names can be disregarded. It appears that members of the text-mining for chemistry community are using one or more of the commercial name to structure software programs to convert chemical names to structures and, prior to feeding the algorithms, they are removing all white spaces from the names. They are also doing the same, in some cases, with dashes. How well is that going to work? Is it safe to remove spaces from chemical names and assume this has no effect? Is consideration being given more to the accuracy of the text-mining than to the nature of systematic nomenclature?
Let’s look at some examples of the result of removing spaces from chemical names. Consider the different results just from moving a space.
Single structure to separate components based on a space.
Another example of multiple to single component structure.
Another example of space-collapsing structure searching
Clearly there is an impact of removing spaces from systematic names. The same is true of random removal and insertion of dashes. The generation of systematic names by chemists is far from ideal as discussed by Gernot Eller here. The mishandling of correct names when reverting back to structures is one more problem layer. There are many of us using text mining and name to structure conversion to link between documents and structures. It is far from a minor undertaking.