Navigating scientific resources using wiki based resources

Presentation given at ACS New Orleans Spring Meeting

There is an overwhelming number of new resources for chemistry that would likely benefit both librarians and students in terms of improving access to data and information. While commercial solutions provided by an institution may be the primary resources there is now an enormous range of online tools, databases, resources, apps for mobile devices and, increasingly, wikis. This presentation will provide an overview of how wiki-based resources for scientists are developing and will introduce a number of developing wikis. These include wikis that are being used to teach chemistry to students as well as to source information about scientists, scientific databases and mobile apps.


Serving up and consuming community content for chemists using wikis #ACSPhilly

Unfortunately I had to leave the ACS Meeting in Philadelphia but my colleague David Sharpe will be giving this presentation for me. I have made available a copy of the presentation on Slideshare here but also uploaded a narrated version onto YouTube.

“Wikipedia has become the world’s most famous encyclopedia using as it’s platform the MediaWiki open source software. The software is supported not only by the MediaWiki foundation but by a community of developers who build widgets and add-ons to extend the capabilities. This presentation will review how MediaWiki has been used as a container for a number of resources of value to chemists, specifically SciMobileApps, SciDBs and ScientistsDB holding content regarding mobile scientific apps, scientific databases and scientists. We will also review how chemistry content within Wikipedia has been used to enhance the content underlying the RSC ChemSpider database and how the platform supports an educational environment for chemistry students.


Olympicene Now Has a Wikipedia Article

The story about Olympicene was released earlier this week to great fanfare online. I discussed the details here. There has been so much press with comments made online on the websites of  Popular Science, Scientific American, BBC News, the Huffington Post and many others that I wondered whether it would be appropriate to suggest an article get written for Wikipedia.

Now, I am VERY concerned with notability on Wikipedia, as evidenced by my post here about notability. I think Olympicene is “notable”. I am also concerned with being flagged with conflict of interest as I was involved with the Olympicene project. My intention was to ask the community to participate in writing the article. However, after checking Wikipedia I was happy to find that the community already got to it. In two days 9 authors had already worked on the article! I checked the View History logs…I don’t know who ANY of them are….so I am not in conflict of interest there either by asking someone to write it!

The Wikipedia article on Olympicene is here. It would be ideal to get the color image up there for dramatic effect as well as the original concept details when Olympicene was introduced, as well as at least some representatives of the synthetic path on ChemSpider SyntheticPages. I can’t add them…I’ll get flagged probably. I’d also link to the ScientistsDB article about Graham Richards as it is much richer than the one on Wikipedia.

Either way…from release to Wikipedia same day and nine authors in two days. Now that’s community collaboration!


The Understanding Reporter from

I get interviewed quite regularly regarding ChemSpider, my views on Open Data and data quality on the internet, as well as general comments about the chemistry data explosion online. So, when I was interviewed recently for the online article “Chemistry’s web of data expands” I was more than happy to give my thoughts regarding patent data coming online, data quality and the need for standards for handling chemistry data.

One of the parts of the conversation was regarding the work put in to clean up chemistry data on Wikipedia. What seems like an eternity ago I did “Dedicate Christmas Time to the Cause of Curating Chemistry on Wikipedia” and initiated a project to check every chemical compound on Wikipedia, bond by bond, atom by atom. However, I very quickly connected with Walkerma who then introduced me to a number of other Wikipedia Chemistry people. I started participating in IRC Chats with this group and we started exchanging comments about how we could move the project along. It was a pleasure to work with the team and while I did continue to participate it was nowhere near the level that I had contributed in the early days of the project. The project was a collaborative effort for sure, one of the best I have been involved with over the past few years.

When the original article on Nature.Com was published it stated “In fact, notes Williams, Wikipedia proved the most reliable source of structure information in that experiment – largely because he had led an effort to clean up the site’s 13,000 structures.” I definitely didn’t want that statement in the article and had specifically requested that I was represented as being part of a collaborative effort. I did not lead the project…I was a part of it only. So, with a couple of email exchanges with the author of the article, Richard van Noorden, the language was changed to “In fact, notes Williams, Wikipedia proved the most reliable online source of structure information in that experiment – largely because of an effort to clean up the site’s 13,000 pages about drugs and chemicals”. It’s a subtle edit but I definitely did not want to carry the responsibility for leading a project that was an ideal representation of crowdsourcing, collaboration and caring for chemistry on Wikipedia. And, to clarify…I know for a fact that all pages are not fully curated and validated yet…it’s a long process!!!


On the Accuracy of Chemical Structures Found on the Internet

A poster presented at the ACS Meeting in San Diego with the UNC Chapel Hill group…

On the Accuracy of Chemical Structures Found on the Internet

The Internet has been widely lauded as a great equalizer of information access.  However, the absence of any central authority on content places the burden on the end-user to verify the quality of the information accessed.  We have examined the accuracy of the chemical structures of ca. 200 major pharmaceutical products that can be found on the internet.  We have demonstrated that while erroneous structures are commonplace, it is possible to determine the correct structures by utilizing a carefully defined structure validation workflow.  In addition, we and others have shown that the use of un-curated structures affects the accuracy of cheminformatics investigations such as QSAR modeling. Furthermore, models built for carefully curated datasets can be used to correct erroneously reported biological data.  We posit that chemical datasets must be carefully curated prior to any cheminformatics investigations.  We summarize best practices developed in our groups for data curation.



A YouTube Cartoon Movie for ScienceOnline2012

I have previousloy blogged on “Why are pornstars more notable than scientists on Wikipedia?”. It created a wave of comments and feedback, some on blog, a lot more off-blog. One of the results was a Xmas project that resulted in ScientistsDB that I discussed here.

I’ve been watching a number of amusing videos that have been showing up on YouTube of late and, as a bit of an exercise nut, I have posted a couple of funny ones to my Exercise blog [1,2]. Last night, while helping my friend Mark Jensen do some stopped-motion photography while he was painting a sign, I happened upon the process by which these types of movies are made. It’s possible to make them using XtraNormal. Since it is possible to make the first one for free…I had enough credits at least to do what I wanted to do…

So, I made an XtraNormal movie about the situation about “Adult Film Stars and Scientists on Wikipedia”and the development of ScientistsDB. I kind of dedicated the movie to ScienceOnline 2012 as that is where I suggested the fictional discussion between “Sean and Tony” would take place.

I’m sure there are going to be a lot of interesting discussions at ScienceOnline2012 and look forward to seeing everyone there. By the way, if you have been involved with any semantic web projects or projects using Linked Open Data please connect with myself @chemconnector or @kristiholmes on Twitter as we’d like to invite you to give a short (3-4 min) talk at ScienceOnline2012.


Why are pornstars more notable than scientists on Wikipedia?

I’m a BIG Wikipedia fan. It is one of my favorite sites, our 9 year old twins have spent many hours on the site with me, and I have personally spent a lot of time, including Christmas, curating chemistry on Wikipedia. I like what Wikipedia has achieved, have willingly contributed articles, but also enjoy a good laugh at Wikipedia’s expense when appropriate. In the past 24 hours I’ve giggled at the latest XKCD cartoon as well as this blog post about Jimmy Wales.

Despite my affection for Wikipedia this week I am annoyed about what’s going on for me on Wikipedia. I’ve read The Wikipedia Revolution and understand the editorial activities and I’ve had many discussions about how authors of Wikipedia articles have been “beaten up” in a friendly way. I’ve been warned about Conflict of Interest policies and yet, because I think it’s important, have tried to navigate the complexities of contributing articles. At present however my contributions on Wikipedia regarding scientists and projects I know about have all been flagged, either for deletion or for “notability”.

I’ve  written the bulk of these articles: Gerhard Ecker, Sean Ekins and Gary Martin. Some of the flags on the articles include “It may have been edited by a contributor who has a close connection with its subject. Tagged since November 2011.

Gary Martin and Sean Ekins are personal friends so YES, I have close connections with the subject. And I believe I can objectively write a good article about them. Just like I wrote about the village I grew up in…Afonwen. I only spent 12 years of my life there….so have a close connection with that too. I have known Gerhard Ecker for about three years, and know about his work from reading his articles and hearing him speak, and feel its valid to contribute an article as I JUDGE he’s a notable scientist. Gary Martin has almost 300 publications, and an h-index of 27. In the domain of NMR anyone who is doing small molecule structure elucidation is almost certainly using technology he has contributed too. He is notable. Sean Ekins is also notable, in my opinion. And surely Wikipedia is about collective opinions.

I have tried to follow notability guidelines for academics but have clearly failed so encourage anyone reading this post to help clean up the articles. If any of you out there happen to know Gerhard, Gary or Sean DON’T contribute though…you might get flagged as being a contributor who has a close connection. It’s much better to write about people you don’t know. Clearly I understand the possible bias …

If I look at the number of chemists on Wikipedia I find the following list of about 480 chemists. That article is a list of world-famous chemists. There is also a smaller list of Russian Chemists. The end of the list looks like this:

See also

These are likely all NOTABLE chemists as I couldn’t find a single article in the list with a challenge on it…but I confess to not looking at each one one at a time. But that’s what we have for chemists….a list of world-famous chemists, biochemists and Russian chemists.

Many of us have heard about how “open” Wikipedia is including many of the exchanges regarding pornography on Wikipedia. In many cases I have to simply caution “welcome to the internet”. We all know its out there…how could we not. There is material on Wikipedia that is shocking, but at the same time educational. But where I take issue, just for comparison purposes, is that top-notch scientists, in my opinion (and I judge that of many others) can be flagged as not notable, yet pages like those listed below for pornstars can exist without question, without flagging but,  I have to assume, are both encyclopedic and notable.

Similar to the list of chemists a search on pornstars gives a full article here but then these incredibly long lists!

The last one is quite a list! I guess its appropriate to list pornstars by decade but scientists tend to perform better over the longer term and can have 40-50 year careers whereas I don’t even want to imagine that for the other career! I struggle to see why the list of references for Ron Jeremy is any more notable/appropriate than the list of references for Gary Martin.

What’s ridiculous is that there is even an article about pornstar pets. What??? This has more of a place on Wikipedia than some of our worlds most published scientists? Is there something wrong with this picture?

While I may not fully understand what is deemed to be appropriate in terms of notability for a scientist, and I do understand the judgment that I might be too close to the scientists to be objective (but I challenge that!) I definitely challenge the status that ponstars deserve more exposure, pardon the pun, than the worlds chemists.

Despite my rants I understand the challenges that will likely show up as comments on this blogpost. I understand that I will be pointed to WP:COI and WP:Notability. I do not get to set the rules, I need to follow them as I am a small part of a very important community of crowdsourced improvement. But, overall, I remain surprised at how there appears to be so much diligence looking at the articles of scientists rather than those of pornstars. I think scientists are generally involved in very notable activities that generally distinguish them from the bulk of the population. I think pornstars are involved in activities that are not particularly notable as the bulk of the population will do them at some point in their life….well, not ALL activities that pornstars do I’m sure…..

I believe we need a change in policy. I believe that scientists deserve more notability than pornstars and that diligence, while appropriate, should be used in a more tempered manner.

There is an alternative solution…




Fill in the #Wikipedia Survey and Help Our Community

Over on the Academic Productivity blog Dario has discussed “Why do scientists (not) contribute to Wikipedia?”. This has pointed to a survey that is one that any user of Wikipedia, especially a scientist, should fill in.

“A survey has been launched by the Wikimedia Research Committee to understand why scientists, academics and other experts do (or do not) contribute to Wikipedia, and whether individual motivation aligns with shared perceptions of Wikipedia within different communities of experts. The survey is anonymous and takes about 20 min to complete. Whether you are an active Wikipedia contributor or not, you can take the survey and help Wikipedia think of ways around barriers to expert participation.”

Please do participate!

Presentation at European Bioinformatics Institute

Last week was quite the trip to the United Kingdom…hit by the flu that put me into bed without a voice for an entire day and then gave the rescheduled talk the next day feeling a little beaten up. The talk discussed the recently conducted survey of public domain databases that I initiated last week (results embedded in the talk) as well as some of the observations comparing data for 10 drugs across a series of Public Domain databases. The meeting was a good chance to meet some of the hosts of some of the databases including PubChem, DrugBank, ChEBI/ChEMBL and SureChem. I’m sorry I missed the first day…


Finding the Structure of Vitamin K1 Online

You would think that finding the correct structure of Vitamin K1 online in public domain resources would be an easy exercise. But not so fast. Using the assertion that the chemical structure is correct in the Merck Index, and then wandering through CAS’s Common Chemistry to validate this assumption, this short movie takes us through Wikipedia, Wolfram Alpha, KEGG, DrugBank, PubChem and other online resources to show how complex and impure the public domain databases are in terms of resourcing good quality name-structure associations for chemicals. Vitamin K1 is actually a rather simple chemical structure. Finding the correct chemical structure online…not so simple.


