RSS

Category Archives: Software

Hamburger PDFs and Making Them Structure Searchable

There have been numerous conversations about “Hamburger PDFs” over the months and the most recent exchange is that between Chris Rusbridge and Peter Murray-Rust. Another conversation that I have seen go on has been about making Word documents structure searchable (cannot track down the appropropriate blog-postings at present).

This is just an fyi comment for the community really since this is a general assumption that Word Documents and PDFs cannot be made structure-searchable. The truth is that both can be made structure searchable. How? Well, you need to write the correct information into the file to enable it but it’s possible. There are a number of solutions out there allowing structure-based searching of Word document files. I believe the first one was originally from Oxford Molecular before being acquired by Accelrys. I think there are now multiple including, I believe, Cambridgesoft, ACD/Labs and probably others.

The only PDF structure searching capability I am aware of is that created by ACD/Labs a few years ago. Their website states “Our Search for Structure system allows you to seek out chemical structures in various file formats throughout your computer’s file systems. These formats include: SK2, MOL, SDF, SKC, CHM, CDX, RXN, and PDF (Adobe Acrobat); DOC (Microsoft Word), XLS (Microsoft Excel), and PPT (Microsoft PowerPoint), and ACD/Labs databases: CUD, HUD, CFD, NDB, ND5, and INT.”

For PDF it was required that structure files were “tagged” appropriately when written to PDF by an embedded PDF generation capability. Since the PDF format can be extended ACD/Labs did so. If we wanted to make the majority of PDF files structure searchable then it seems as if the appropriate thing to do would be to extend the general PDF format for Life Sciences, talk to Adobe about including the capabilities into their tools and get the publishers to support it. Ok, there’s details….but why isn’t anyone talking about extending PDF to support structures in this way. it’s already proven, years ago.

Next thing will be that structures will be getting embedded into Word documents and made searchable as if it is something novel. It’s been done many times already. The ACD/Labs website states “Microsoft Word documents with structures created in ChemDraw or MDL ISIS can also be retrieved. Not only can you perform exact structure searches, but you can also search by substructure. Added options allow you to preview search results, open search result documents in ChemSketch as well as in other applications, and store search results for later access.” There are other products doing this too.

Strangely people don’t seem to know about these capabilities. They will…as we move forward to index the web for structures we hope to build the capabilities to search structures inside Word documents directly.

 
4 Comments

Posted by on May 3, 2008 in Computing, Software

 

Tags: , ,

Spaces, Dashes and Issues with Nomenclature Conversion

I’ve been involved with Nomenclature in one way or another for well over a decade. While I’m an NMR spectroscopist by training (as evidenced by the >100 publications in this area)  during my decade long tenure  at ACD/Labs I learned a lot about: PhysChem parameters and their prediction, systematic nomenclature, structure drawing and databasing, chemometrics, LC-MS data analysis and so on. As the product manager for many of these products I was dropped in the deep end. Nomeclature was something I really enjoyed. While I am not a  nomenclature specialist in terms of a “generate a perfect systematic name for Taxol level” I have a decade of experience working with nomenclature software for both generation of names from structures and the generation of structures from names. Having worked with 100s of customers and their needs I’ve dealt with a lot of beliefs around nomenclature and perceptions of how to use the tools.

Having just spent the week at Bio-IT and having been engaged with a number of conversations about Name to Structure conversion, it became clear that one of the prevailing beliefs for users of name to structure conversion packages is that spaces in systematic names can be disregarded. It appears that members of the text-mining for chemistry community are using one or more of the commercial name to structure software programs to convert chemical names to structures and, prior to feeding the algorithms, they are removing all white spaces from the names. They are also doing the same, in some cases, with dashes. How well is that going to work? Is it safe to remove spaces from chemical names and assume this has no effect? Is consideration being given more to the accuracy of the text-mining than to the nature of systematic nomenclature?

Let’s look at some examples of the result of removing spaces from chemical names. Consider the different results just from moving a space.

The impact of spaces on naming

Single structure to separate components based on a space.

Another example of multiple to single component structure.

Another example of space-collapsing structure searching

Clearly there is an impact of removing spaces from systematic names. The same is true of random removal and insertion of dashes. The generation of systematic names by chemists is far from ideal as discussed by Gernot Eller here. The mishandling of correct names when reverting back to structures is one more problem layer. There are many of us using text mining and name to structure conversion to link between documents and structures. It is far from a minor undertaking.

 

Tags: ,

XOBNI – Rev Up Microsoft Outlook to EXCELLENT EFFECT

I watch the Microsoft Chemical Team Blog and was very interested in this post about Xobni. With a little work I managed to get the beta sent to me overnight and have been playing with it. I probably haven’t found all of the benefits yet but ohmigod! I love this app. I must have saved myself at least an hour today just in searching through Outlook and searching for old information.

Xobni is showing me how much email I am getting, from who, their ranking, easy access to attachments, an emails’ “network”…it’s very social networking in that way. I’m not going to belabor it’s value. All i can say is if you are an Outlook User this is a must have for you. I am allowed to invite 6 people to receive a beta test of Xobni so let me know if you want one.

 
Leave a comment

Posted by on May 2, 2008 in Software

 

Tags: ,

 
Stop SOPA