Category Archives: Computing

Mobile Chemistry and Generation App

A presentation given today at the ICIC Meeting in Barcelona #icic2011

While the internet has been revolutionizing our access to data and information via our computers, computers have been miniaturizing to the point where a smart phone offers capabilities that many desktops could not deliver less than a decade ago. Mobile browser technology and app-based delivery for software has now delivered into our hands further access to data via phones, pads and tablets. Whether it be in the form of chemical calculators, accessing publishers websites or public domain databases containing millions of chemical structures, mobile chemistry is here and is expanding in capability and coverage at a dramatic rate. This presentation will review the status of mobile devices and how they are being used to enable chemists.




Tags: , , , ,

How Accurate was Google Scholar Citations in Detecting my Publications?

I blogged earlier this week about Google’s Brilliance with their new Google Scholar Citations. I was interested to know whether they found all of my papers so have spent a couple of hours checking. The answer? No…they missed 11 of the papers. They are listed below.

1) R.C. Hynes, J.R. Morton, J.A. Hriljac, Y. LePage, K.F. Preston, A.J. Williams, F. Evans, M.C. Grossel and L.H. Sutcliffe,  Isolated Free Radical Pairs in Rb+TCNQ- 18-crown-6 Single Crystals, J.Chem. Soc.,Chem. Commun., 5, 439 (1990)
2) R. Hynes, K.F. Preston, J.J. Springs, J. Tse and A.J. Williams, EPR Studies of M(CO)5-  Radicals (M = Cr, Mo, W) Trapped in Single Crystals of PPh4+ HM(CO)5- , J. Chem. Soc. Faraday Trans., 87(19), 3121 (1991)
3) R. Hynes, K.F. Preston, J.J. Springs, and A.J. Williams, X-Ray Crystallographic, Single-Crystal EPR, and Theoretical Study of Metal-Centred Radicals of the Type {C5R5Cr(CO)2L}
4) R. Duchateau, A.J. Williams, S. Gambarotta and M.Y.Chiang, Carbon-Carbon Double-Bond Formation in the Intermolecular Acetonitrile Reductive Coupling Promoted by a Mononuclear Titanium (II) Compound. Preparation and Characterization of Two Titanium (IV) Imido Derivatives, Inorg. Chem. 30, 4863 (1991)
5) B. Antalek, A.J. Williams, E. Garcia and J. Texter, NMR Analysis of Interfacial Structure Transitions Accompanying Electron Transfer Threshold Transitions in Reverse Microemulsions, Langmuir, 10, 4459, (1994)
6) R.Lok, R. Leone and A.J. Williams, Facile Rearrangements of Alkynylamino Heterocycles with Noble Metal Cations, Journal of Organic Chemistry 61(10), 3289 (1996)
7) D.E. Brown, A.J. Williams and D. McLaughlin, WIMS – A Web-based Information Management System, Trends in Analytical Chemistry, 16, 370 (1997)
8 ) A.J. Williams, Combining Sample, Structural, and Spectral Information in an Information Management System, Sci. Comput. Auto. 15, 60 (1998)
9) M.E. Elyashberg, K.A. Blinov and A.J. Williams, Computer-aided Molecular Structure Elucidation on the Basis of 1D and 2D NMR Spectra, Applied Magnetic Resonance, (May 2000)
10) G. M. Rishton, K. LaBonte, A. J. Williams, K. Kassam and E. Kolovanov.  Computational approaches to the prediction of blood-brain barrier permeability: a comparative analysis of central nervous system drugs versus secretase inhibitors for Alzheimer’s disease Current Opinion in Drug Discovery & Development, 9, 303 (2006)
11) A. J. Williams, V. Tkachenko, C. Lipinski, A. Tropsha and S. Ekins, Free Online Resources Enabling Crowdsourced Drug Discovery, Drug Discovery World Winter 2009/10, 33-39

Fortunately it is easy to add them in…and that is in process. Simply do this:

* To add one article at a time, select the “Add” option from the Actions menu. Then, type in the title, the authors, etc., and click “Save”. Keep in mind that citations to the article you’ve just added may not appear in your profile for a few days.

* To add a group of related articles, select the “Import” option from the Actions menu. Search for your article using its title, keywords, or your name. Click “These are mine” next to the group you wish to add. If you have written articles under different names, with multiple groups of colleagues, or in different journals, you may need to select multiple groups. Your citation metrics will update right away to account for the group(s) you’ve just added.

* When you add a group of articles, we’ll also keep track of changes to this group as our search robots index the web. You can choose to have these changes automatically applied to your profile (recommended) or emailed to you for review. Select “Profile updates” under the Actions menu to configure the updates.”


What’s MORE brilliant though is Google Scholar Citations found papers, book chapters and posters that I didn’t have in my CV. They are now. I remain impressed.

1 Comment

Posted by on August 5, 2011 in Computing, General Communications


Encouraging Collaboration in Washington as a Hub for Chemistry Databases

On August 25/26 I will be attending the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry. I will have the opportunity to spend time with people I appreciate for the contributions they are making to chemistry: Martin Walker, JC Bradley, Andy Lang, Markus Sitzmann, Ann Richard, Frank Switzer, Evan Bolton, Marc Zimmermann, Wolf Ihlenfeldt, Steve Heller, John Overington, Noel O’Boyle, and many others. It is surely going to be an excellent meeting. The agenda is given here.

Some of the people listed above are associated with “Washington-based databases”. Databases that are developed in or around Washington by government-funded organizations – the FDA, NIH, NCBI/NLM, NCI, NIST. There are also other government funded databases, non-Washington-based, represented – EPA and CDC. If you are not sure what all those three letter acronyms are then here you go.

FDA – Food and Drug Administration

NIH – National Institutes of Health

NCBI/NLM – National Center for Biotechnology Information/National Library of Medicine

NCI – National Cancer Institute

EPA – Environmental Protection Agency

CDC – Center of Disease Control

NIST – National Institute of Standards and Technology

One organization with a chemistry database conspicuous by its absence is the NCGC data collection contained in the NPC Browser. I’ve blogged a lot about this one on this blog.

NCGC – NIH Chemical Genomics Center

I am hoping to get to talk to some members of the team if they attend the meeting though.

There will be a LOT of government databases represented at this meeting. I have experience with many of the databases provided by these institutions. The DSSTox database is one of the most highly curated databases based on my review of the data. The NCI resolver is an excellent resource with good quality data in terms of the accuracy of name-structure relationships.

The various databases are developed independently of each other. True, some of the databases contain contents from some of the other databases but, as far as I can tell, there is not much collaboration in terms of coordinated curation of data. What would it be like if each of these organizations participated in a roundtable discussion to agree to a process by which to collaboratively validate and curate the data, once and for all? Maybe this meeting can catalyze such a discussion. I would encourage the organizations to take advantage of other data sources that can share their data – ChEBI/ChEMBL is one example! If these various groups coordinate their work then the result could be a massively improved quality dataset to share across the databases and across the community. If this work was done then the group that assembled the NPC Browser would likely have a lot less work to do in terms of assembling the data. The various database providers should certainly have provided clean, curated data for many of the top known drugs. While working on a manuscript reviewing the quality of public domain chemistry databases I assembled a table of 25 of the top selling drugs in the US and checked the data quality in the NPC Browser relative to a gold standard set. The assembly of the data will be discussed in its entirety  in a later publication.

25 of the Top Selling Drugs in the USA - Data Quality in the NPC Browser

The errors listed in the table are:

1 Correct skeleton, No stereochemistry
2 Correct skeleton, Missing stereochemistry
3 Correct skeleton, Incorrect stereochemistry
4 Single component of multicomponent structure
5 Multiple components for single component structure
6 No structure returned based on Name Search
7 Incorrect skeleton
8 Multiple structures based on name search


Clearly there are a lot of errors in the structures associated with 25 of the best selling drugs on the US market. These should be the easy ones to get right as they are so well known!!! Collaboration between the domains top database providers would have helped, almost certainly. This would not necessarily be an issue of meshing technologies but agreeing on a common goal to have the highest quality data available. Since the government puts so much money into the development of these databases it would be appropriate to have some oversight and push for aligning efforts. Collaboration is essential!

With that in mind…a shameless pointer to how Sean Ekins, Maggie Hupcey and I BELIEVE in the need for collaboration…our book. If we can encourage others in the government chemistry databases to adopt active collaborative approaches wonderful things could happen.

Collaborative Computational Technologies for Biomedical Research



Tags: , ,

Announcing the SciMobileApps Wiki for Community Based Listing of Science Apps

I am sure that most of you are already smartphone or tablet users as many people visiting this blog are, like me, interested in the latest technologies. I’ve been a smartphone user for a number of years and certainly did get caught up in the iPhone and iPod wave using both mobile technologies. Now with the Android OS abounding on both phone and tablet it will be interesting to see how the next few years play out for me in terms of dedication to Apple technologies.

With the “world of Apps” came a lot of interest in how science would make use of this new technology platform. I have given presentations on “Mobilizing Chemistry” and it has had over 3000 views.  I’ve written an article with Harry Pence regarding SmartPhones in the classroom and have been very passionate about making sure that ChemSpider is supported on mobile platforms with ChemMobi [Working with James Jack 1,2] as well as the Mobile Browser support work done by Sergey Shevelev in our team [3,4]. While my personal bias is chemistry clearly apps cover all of science….and these scienceapps are growing in number.

I am fortunate to have worked with a terrific group of co-authors to pen an article regarding “Mobilizing Chemistry in the World of Drug Discovery”. This article, written by Sean Ekins, Alex Clark, Rich Apodaca, James Jack and myself  was submitted today. In parallel Sean and I decided that since we had done the work to assemble a collection of apps for the article it made total sense to keep track of this on an ongoing basis so we’ve set up a wiki so that the community can help us track what is available. This wiki is at and offers you the opportunity to update the wiki with an overview of a scientific app or your review of an app that might already be there. Wikipedia is very cautious about having articles posted on apps as they see them primarily as advertising. We are of the opinion that this site can serve the ability of advertising your apps if you are willing to put in the work to list it. As long as you do this in an appropriate manner, much as emulated by Alex Clark with his MMDS, MolPrime, Yield101 and Reaction101 apps, there is no problem.

It is assumed that most of you will know how to edit in the world of MediaWiki. If not I suggest looking for basic MediaWiki instructions online (it’s the same platform as used for Wikipedia!). As you will see the SciMobileApps is NOT just for chemistry but for all forms of science as listed on the Main Page and can even be extended as the community sees fit. As time allows we’ll put together a page of help tips for you to follow and maybe a short movie. This is an after-hours project only and is aligned with the publication we have submitted for publication. This is simply a community resource for scientists. Enjoy. We welcome your feedback.



Your Opinion WANTED on how should the structure of Tegaserod be drawn

Those of you who watch this blog know that many of the discussions are about chemical structures, accurate representations on databases and how to “correctly” communicate chemical structures/compounds for the users. So, this is an OPINION question…it’s not an “I have an answer” blog post.

So, Tegaserod has, according, the Dailymed here the structure below:

It can be envisaged as having a trans-orientation but the name on DailyMed doesn’t indicate trans….”3-(5-methoxy-1H-indol-3-ylmethylene)-N-pentylcarbazimidamide”

On Wikipedia here we see the structure below and a systematic name supporting a trans-orientation.

Now there are actually a number of ways to represent Tegaserod and, since there’s no stereochemistry to complicate the molecule, and we are interested in the skeleton per se, we can search on the first part of the InChI on a database like ChemSpider. A search on IKBKZGMPCYNSLU as the first part of the InChI for the structure gives 3 hits. Take a look.I don’t see any real reasons to show the crossbonds for the NH but so be it.

Now, consider that the three hits are E-, Z- and crossbond orientations, and their InChIKeys are as shown below, the results set is indeed expected. My question, based on the structures that you see for Tegaserod, would you prefer to see the compound drawn and how would you expect it to be held in the database. Think about what you would expect to happen in terms of a search. If you drew a cis-form should it retrieve cis and crossed? If you drew crossed should it retrieve cis and trans? etc. Remember, it’s an opinion so no answer is wrong…


Posted by on February 18, 2011 in Computing, Data Quality, InChI


How Fast is #Google Indexing. AMAZINGLY fast.

Tonight I was amazed for the first time in a long time. What amazed me was how fast that the post I made to Twitter got indexed. I said:

What I meant is that the structure image on Wikipedia has no stereochemistry. See here.

About 2 minutes later I did a search on Google to see whether I could find Goserelin and compare the stereochemistry for what I believe is the structure (from ChemSpider here). What I found was a short list of hits but also this:

This was literally within a couple of minutes of me posting the Tweet.

Ok..we live in an amazing world. Our networks are so-interlinked at this point that the scope of what we are achieving, and will achieve as the semantic web comes to life is, simply put, amazing. This observation impressed me. Maybe it shouldn’t but it did….is there something obvious that is going on here that I am missing? Should I not be so impressed?


Posted by on February 17, 2011 in Computing


Tags: , ,

Who Could Participate in #NMRCAVES

I recently posted about the project that will become known as NMRCAVES, NMR Computer-Assisted Verification and Elucidation Systems. This will be a workshop to be held at SMASH. There will be no workshop without two essential ingredients: participants and data.

The participants will need to be willing participants to work with us with their software, algorithms and approaches to test their systems on data. The data will be data supplied by the community and provided to the participants in a blind study to test their  systems.

To populate the workshop is the first challenge. if we cannot get enough participants then even though we might get an abundance of data there will be no workshop to hold if we cannot engage the groups to work with it. There are a limited number of groups/individuals working in the areas of computer-assisted structure verification and elucidation by NMR. I have listed them below. No offense meant if I have accidentally  missed anyone out. Also, they are listed in alphabetical order so no favoritism either…


ACD/Labs Structure Verification with NMR Predictors

Bruker Complete Molecular Confidence

Mestre Labs MNova

Sciencesoft VerifyIt


ACD/Labs Structure Elucidator

Jmnsoft LSD

SENECA package for Computer Assisted Structure Elucidation (CASE) ported into the CDK

Sciencesoft AssembleIT

Can anyone point me to groups or software solutions  that I am missing and other potential solutions out in the community that I should approach? I will be approaching the listed groups with an invite to participate in NMRCAVES and then will be asking the community if you are willing to provide data for the project!


#NMRCAVES is NMR Computer Assisted Verification and Elucidation Systems

I am honored to have been invited to lead a workshop at the SMASH NMR conference later this year. I will be co-hosting with Michael Bernstein, someone who I have known for many years and with whom I have spent many hours (if not days!) discussing the ins and outs of NMR prediction and structure verification by NMR,

The workshop will provide an environment for developers of software packages and associated algorithms allowing for structure verification and elucidation to engage with interested members of the NMR community attending the SMASH NMR meeting. Presenters may include both commercial and non-commercial software packages and the workshop will allow the participants to report on their respective approaches as well as report on the performance of their algorithms against a large set of data provided by the community.

The one day workshop will be separated into Structure Verification and Structure Elucidation segments with participants who have chosen to participate in the project. We are hoping for participants from both the academic and commercial sectors.

I’ve called the workshop NMRCAVES: NMR Computer Assisted Verification and Elucidation Systems. Below is an outline to initiate a conversation with interested parties. It is a suggested outline for the project and I welcome feedback.

The data analysis components of the workshop are outlined below.
CASV: Four sets of data will be made available to the participants.
(1)    HNMR only, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(2)    HNMR and 2D HSQC, minimum of 25 spectra and 25 suggested structures (random distribution of correct/incorrect with at least 50% correct)
(3)    HNMR only, minimum of 25 spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)
(4)    HNMR and 2D HSQC (preferably multiplicity edited-HSQC) minimum of 25 sets of spectra and 25 sets of 3 structures (1 of each of the 3 is the always the correct structure)

The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). The participants will have the responsibility to provide a report identifying the correct/incorrect structures in test sets (1) and (2) and identifying the correct structure out of the combination of 3 provided in (3) and (4). When all reports have been submitted each participant will receive a report identifying the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

CASE: The objective should be to test the ability of algorithms to correctly elucidate the skeletons of unknowns with the provision of “high-quality datasets” where sensitivity is deemed not to be a limitation.  While it is acknowledged that sensitivity is an issue in CASE approaches this particular hurdle should be removed from the challenging of the algorithms. Request data from a series of laboratories. The minimum dataset should include “High-resolution MS”, 1H, COSY, HSQC/HMBC. Additional data can include TOCSY, DEPT-HSQC, HSQC-TOCSY, 1H-N15 direct and long-range correlation, NOESY/ROESY.
The participants will receive the data via download from an FTP site with each folder numbered in an ambiguous manner. All structures will be known to only two parties: the laboratories acquiring the data and the host of the workshop (AJW). All elucidations will be done blind and the participants will have the responsibility to provide a report including a table of the top 3 structures for each dataset, rank-ordered if possible, from most-likely to least-likely. When all reports have been submitted each participant will receive a report containing the correct structures for their review and in order for them to report on their successes and to further review and report on the data during the workshop.
The overall performance statistics comparing the results of the various participants will be reviewed and presented at the workshop by the workshop host.

Outcome of Project
1)    A review of the state of contemporary computer-based structure verification and elucidation
2)    All data to be publicly shared and made available as Open Data for download and to become a gold standard reference set of data for the community to utilize for further testing and development
3)    All processed spectra to be uploaded and available on a public domain database (e.g. ChemSpider) and associated with the correct chemical structure
4)    A minimum of one co-authored publication reviewing the results of the workshop and associated studies

Your feedback, comments and questions are welcomed. We are especially looking for laboratories who are willing to provide sets of data for analysis during the project as well as software groups who develop algorithms for structure verification and elucidation and who wish to participate in the project.


Fail Fast Despite the Hype – A Model from Google Wave

I’ve been to Scifoo twice. Both times were great. I didn’t get to go this year…and I am sad not angry that I wasn’t invited. It is terrific that other people, new and old attendees, got to share in the wealth of experience that makes up SciFoo. I hope that it continues and I hope I get to go again.

The first time I went the Google Datasets project was announced. It seemed like a great offer to make to the scientific community. There clearly wasn’t enough participation for the effort as the project was promptly killed.

The next time I went back to Scifoo Google Wave received a lot of attention. Cameron Neylon helped integrate ChemSpider into Wave with ChemSpidey and the potential of Google Wave exploded across the internet as Google’s next big win. I thought the technology was “cool”, interesting, technology looking for a problem and “noisy”…it was very distracting, difficult for me personally to adopt into my daily work. I did play with it, worked on a couple of projects with some colleagues and conceived of how we would use some of the functions.

And now Google Wave is winding down….and I take this comment to heart “…despite these wins, and numerous loyal fans, Wave has not seen the user adoption we would have liked. We don’t plan to continue developing Wave as a standalone product.” Basically they have learned some lessons, probably got some very nice capabilities to plug in elsewhere later, and have decided, to stop investing. I’d love to know what their process was to come up with this decision. Wave was a massive story in the media….and well executed in terms of marketing the story up. How many companies are this clean with an announcement in terms of killing a project of this size…making a tight blog post on the company blog. It’s surprising to see it happen this way, but I have to respect them for the style of pulling the plug and, failing fast. There are lots of other companies who would continue to invest, fearful of the fallout of pulling the plug on a high profile project. Good for you Google…it’s a shame it didn’t work…I DID like pieces of the technology but overall I wasn’t an adopter.  But thanks for this “The central parts of the code, as well as the protocols that have driven many of Wave’s innovations, like drag-and-drop and character-by-character live typing, are already available as open source, so customers and partners can continue the innovation we began”. The community will probably take them!

Leave a comment

Posted by on August 4, 2010 in Computing, Software



How long does it take to update WordPress?

It can take a long time to update software. Especially when there are processes, procedures and testing involved. I know…I was involved with ACD/Labs when they rolled out their Updater allowing people to update their ACD/Labs software…it’s great for individuals but corporations would find it dangerous. WordPress is the blogging platform for this blog and updating it takes time…how much time? Well, about a few seconds to logon, click on “WordPress 3.0.1 Update Now” and let it happen. The results were seamless, the blog didn’t break, I didn’t lose anything and was back in production after I walked to the kitchen, grabbed a coffee and walked back. I also write on Blogger some cathartic poetry (FourQuadrantsPoetry) and adventures of an aging sportsman ( They simply update in the background and I don’t even know about it. I install windows updates all the time and over the past few years, though it hasn’t been t0tally painless, today it is mostly seamless. I must say though that the latest iPhone OS upgrade sucked and my phone has become slow as a dog to move between apps. Just horrible. But my congrats to WordPress for an update well done….seamless and fast.