Archive Info

You are currently browsing the The ChemConnector Blog weblog archives for January, 2008.

Taking a Break From Wikipedia Curation

I blogged previously about curating Wikipedia chemistry pages…specifically a focus on chemical structures and the quality of systematic names, trade names, structure images and outlinks to other site. This project has moved quite well….a LOT of eyeballing into the early hours. I am taking a break to catch up with some other work for the next couple of weeks (at least). As it is I have made my first pass to the letter P (having done X,Y,Z) already. There are 1100 links left for me to review - links to pages that I need to click on, open up, see if it is a structure page and then curate and validate.

I think what’s been done to date has been of value to the WP:CHEM team and to the overall quality of what’s on there. I had questioned in my own head how important and valuable the effort was. Thanks to Walkerma who pointed out this facility today it is clear that the chemistry pages are getting a lot of visits…over a 100 per day in many cases. A report on my progress is posted online here.

t’s been a work of passion to this point. Now, the reality is it is just work. I am tired of looking at Wikipedia pages (no insult to WP but I have tired eyes). It will get finished, and I hope by the end of the month…I won’t be rushing it since it will impact the quality but I will be glad when it’s over :-)

My friend “An American Citizen”

My friend “Halbstein” has started his own blog - American Citizen. Recently he and I sat for lunch and talked about the politics of health care in the United States and we reviewed a very interesting article together. He has commented about this on his blog and I recommend people interested in the costs of health care in the USA to browse it. Very revealing …go to his blog for info.

In response to his post I waxed lyrical about the movie Field of Dreams, Burt Lancaster and my doctor when I was growing up. Halbstein took it one step further…and does it in a way that might stimulate you all to remember what medicine used to be like. While technology and  medicine have advanced I have to ask the question whether patient care and doctor-patient relationships have balanced it by going the other way? Read about Dr Lipmann.

Does the Power of Marketing Equate to the Stupidity of the Public?

I am an iPod user. I couldn’t wait to see DVDs in Blu-Ray format. I believed (twice…mistakenly) that German cars would be better than Japanese (I was wrong!). The majority of us are powerfully influenced by marketing. Specifically it is impossible to miss the latest “bandwagon jumpers” from the food companies when there is yet another way made available to them to manipulate the public. How many unhealthy foods do you see labeled as “cholesterol-free, sugar-free, fat-free, blah-blah-blah”. Ok, so a food can be cholesterol free but does that mean it’s good for you? Fat-free…whoo-hoo…balance that with “stocked with calories from a gazillion sugar calories” and who really cares. It shouldn’t be that difficult to have a gut-level instinct around what’s good and what’s bad to put in your mouth and down to your stomach. I DO put bad stuff in there…chocolate, french fries etc. but I am under not under an illusion that they might be bad for me…I know they are…but moderation and life balance takes care of that.

Why the rant? Trans-fat. TRANS-FAT!!! Ok, so there’s science to the outcries to remove it from food. Personally I prefer butter over margarine now despite the “butter is bad for you…eat this pot of chemistry called margarine” pitches over the years. And yes..I listened and ate chemistry for a long time. It’s not the science behind trans fat that worries me …it’s the vampire marketers using it to their advantage. Look at the image below. Why the hell are they labeling 100% sugar as Trans-fat free? Don’t the public know that sugar isn’t fat? Does labeling it trans-fat free make a bag of pure sugar good for you? Whoo-hoo..grab a spoon. Come on people…wake up. Manipulation is the art of the marketer. What’s next …a bottle of water labeled as trans-fat free, sugar free, cholesterol free? Maybe it IS appropriate to label a GLASS BOTTLE of water as “plasticizer free”…take a whiff of a PLASTIC bottle of water when it’s sat in the car for a week. Let’s not be sheeple to the marketers…

sugar_and_trans_fat.png

Sign up to Receive ChemConnector Via Feedburner

I am finally getting back into blogging after a Christmas Season spent doing Wikipedia Curation and meeting grant application deadlines. So, both this blog and the ChemSpider blog are going to become a little more active. Since this is a new blogsite if you are interested in receiving the posts into your email simply fill in your email address in the box on the right that looks like this:

feedburner1.png

Dedicating Christmas Time to the Cause of Curating Wikipedia

I’ll confess that despite the lure of Christmas candy, repeats of oldie-but-goodie movies and the urge to go hack down a Xmas tree I found it difficult to stay away from my computer over the holidays. While I stayed silent in the blogosphere I probably spent more nighttime hours with my laptop than I have in the past few months. I had a conversation with Walkerma from the Wikipedia Chemistry group in December and confessed my interest in curating Wikipedia chemical structures. For those of you who read the ChemSpider blog you’ll know I have rather a passion for curation. And I’d done a significant amount of it on ChemSpider and also, of late, on Wikipedia….see the taxol and diazonamide stories.

We have recently announced our intention to rollout WiChempedia over on the ChemSpider blog. Now, before we go grab the chemistry content I wanted to make sure that we could grab “clean” data. In keeping with the structure centric nature of the system we want to build my first charge was to check/validate/curate the structure-name pairs on Wikipedia. Using some CSV files provided to me by Martin Walker I went to work. First of all, those CSV files were dirty…the word Ethanol shows up in some obscure places. With the assistance of a good action movie, a glass of wine, some basic text queries and removals, and some delete-delete-delete keystrokes and I had removed the majority of “no way it could be a chemical” text strings. Then, I imported the list of chemical names into a desktop chemical structure databasing tool (more on the tools in a separate post) and I went to work. There were a few little tricks to make the whole process easier but that will be detailed elsewhere. I could actually manage to check a structure in about 2 minutes per in general. In some cases I had to redraw structures (some took a LONG time). I wandered between PubChem and ChemSpider, Chemrefer and Google looking for confirmation of structures and registry numbers.

I’ve made many edits to the Wikipedia entries already…you can see my contributions since Dec 15th online. I recently started to keep a mare detailed report of mistakes/suggestions/comments I have made on structures on Wikipedia structures (as a result of a conversation with Walkerma). The latest report is here. Walkerma is posting a version of this online for members of WP:CHEM to comment on.

My overall conclusions so far…my estimate is that about 2-3% of the structure records online have errors. What’s an error?

1) The structure does not match what it “should be” based on review of many other sources.

2) Systematic nomenclature can be poor…if the name displayed on Wikipedia is converted to a structure then sometimes it is inconsistent with the actual structure displayed

3) Sometimes the formula or mass displayed in the ChemBox are inconsistent with the actual mass or formula of the structure displayed

4) The SMILES or InChI String associated with the structure can produce a different structure when converted.

5) The registry number matches either a different structure or a different “form” of the structure. For example, the structure shown is a neutral form of the compound but the registry number is for the salt.

There are other issues but the ones are above are the most common.

It turns out that Peter Murray-Rust and his group have been doing similar work according to his post here . I appreciate his comment “We are very grateful for this work. We are also doing similar things and we’d be delighted to coordinate”.

While this is not exactly Open Notebook Science - as I do the work of curating Wikipedia records I am keeping records, putting them up online for others to check and comment on so this is Collaborative Science through curation. This IS actually having an impact on the Wikipedia records every 24 hours at present. Not only am “I” making edits of records as I find errors but when I open the conversation with others for their comments then they make decisions and appropriate edits. You can see WP users making edits according to my comments - see here for example. I’m interested to see the similar contributions from Peter’s team.

There is expected to be an IRC chat with the WP:CHEM team in the near future and hopefully a chance to compare notes, processes and the path forward for curation. I’m looking forward to the opportunity to hearing about Peter’s teams approach to curating the data and identifying how we differ and how we can mesh our efforts. It would be good if PMR’s group can adopt an Open Notebook Science approach to Wikipedia analysis as he did recently with the NMR analysis. In that way we’ll be able to jointly track our efforts as we work together to help the Wikipedia team. (Peter- if you are reading can you share your experiences of curating Wikipedia and what your team is observing. Can we do Collaborative Science on this project together?)