RSS

Tag Archives: Wikipedia

Why are pornstars more notable than scientists on Wikipedia?

I’m a BIG Wikipedia fan. It is one of my favorite sites, our 9 year old twins have spent many hours on the site with me, and I have personally spent a lot of time, including Christmas, curating chemistry on Wikipedia. I like what Wikipedia has achieved, have willingly contributed articles, but also enjoy a good laugh at Wikipedia’s expense when appropriate. In the past 24 hours I’ve giggled at the latest XKCD cartoon as well as this blog post about Jimmy Wales.

Despite my affection for Wikipedia this week I am annoyed about what’s going on for me on Wikipedia. I’ve read The Wikipedia Revolution and understand the editorial activities and I’ve had many discussions about how authors of Wikipedia articles have been “beaten up” in a friendly way. I’ve been warned about Conflict of Interest policies and yet, because I think it’s important, have tried to navigate the complexities of contributing articles. At present however my contributions on Wikipedia regarding scientists and projects I know about have all been flagged, either for deletion or for “notability”.

I’ve  written the bulk of these articles: Gerhard Ecker, Sean Ekins and Gary Martin. Some of the flags on the articles include “It may have been edited by a contributor who has a close connection with its subject. Tagged since November 2011.

Gary Martin and Sean Ekins are personal friends so YES, I have close connections with the subject. And I believe I can objectively write a good article about them. Just like I wrote about the village I grew up in…Afonwen. I only spent 12 years of my life there….so have a close connection with that too. I have known Gerhard Ecker for about three years, and know about his work from reading his articles and hearing him speak, and feel its valid to contribute an article as I JUDGE he’s a notable scientist. Gary Martin has almost 300 publications, and an h-index of 27. In the domain of NMR anyone who is doing small molecule structure elucidation is almost certainly using technology he has contributed too. He is notable. Sean Ekins is also notable, in my opinion. And surely Wikipedia is about collective opinions.

I have tried to follow notability guidelines for academics but have clearly failed so encourage anyone reading this post to help clean up the articles. If any of you out there happen to know Gerhard, Gary or Sean DON’T contribute though…you might get flagged as being a contributor who has a close connection. It’s much better to write about people you don’t know. Clearly I understand the possible bias …

If I look at the number of chemists on Wikipedia I find the following list of about 480 chemists. That article is a list of world-famous chemists. There is also a smaller list of Russian Chemists. The end of the list looks like this:

See also

These are likely all NOTABLE chemists as I couldn’t find a single article in the list with a challenge on it…but I confess to not looking at each one one at a time. But that’s what we have for chemists….a list of world-famous chemists, biochemists and Russian chemists.

Many of us have heard about how “open” Wikipedia is including many of the exchanges regarding pornography on Wikipedia. In many cases I have to simply caution “welcome to the internet”. We all know its out there…how could we not. There is material on Wikipedia that is shocking, but at the same time educational. But where I take issue, just for comparison purposes, is that top-notch scientists, in my opinion (and I judge that of many others) can be flagged as not notable, yet pages like those listed below for pornstars can exist without question, without flagging but,  I have to assume, are both encyclopedic and notable.

Similar to the list of chemists a search on pornstars gives a full article here but then these incredibly long lists!

The last one is quite a list! I guess its appropriate to list pornstars by decade but scientists tend to perform better over the longer term and can have 40-50 year careers whereas I don’t even want to imagine that for the other career! I struggle to see why the list of references for Ron Jeremy is any more notable/appropriate than the list of references for Gary Martin.

What’s ridiculous is that there is even an article about pornstar pets. What??? This has more of a place on Wikipedia than some of our worlds most published scientists? Is there something wrong with this picture?

While I may not fully understand what is deemed to be appropriate in terms of notability for a scientist, and I do understand the judgment that I might be too close to the scientists to be objective (but I challenge that!) I definitely challenge the status that ponstars deserve more exposure, pardon the pun, than the worlds chemists.

Despite my rants I understand the challenges that will likely show up as comments on this blogpost. I understand that I will be pointed to WP:COI and WP:Notability. I do not get to set the rules, I need to follow them as I am a small part of a very important community of crowdsourced improvement. But, overall, I remain surprised at how there appears to be so much diligence looking at the articles of scientists rather than those of pornstars. I think scientists are generally involved in very notable activities that generally distinguish them from the bulk of the population. I think pornstars are involved in activities that are not particularly notable as the bulk of the population will do them at some point in their life….well, not ALL activities that pornstars do I’m sure…..

I believe we need a change in policy. I believe that scientists deserve more notability than pornstars and that diligence, while appropriate, should be used in a more tempered manner.

There is an alternative solution…

 

 

 

Tags: ,

Community Views and Trust in Public Domain Chemistry Resources

Over the past 4 weeks I have been involved with some new and old friends in the world of chemistry to initiate an analysis of “quality” in public chemistry resources. This is work in progress and involves 3 separate groups (lets call them labs) looking at various resources. Here’s s short description of the project.

The questions we are attempting to answer are:

Core question : what is the quality of data online in public chemistry databases? How accurate and unambiguous is the representation of chemical structures and their measured properties in public chemistry databases?

How capable are the present cheminformatics tools of handling the complexities of structure representations – limited to “small” organic molecules
How hard is it to generate a reference set of highly curated, “gold standard” data (chemical structure and activity) for a database of “known drugs”?

We’ve started with the top 200 selling drugs. The three labs had to come to an agreement about which of the top 200 drugs were small molecules (many of the top 200 are monoclonal antibodies or polymers for example). We then had to decide would we deal with mixtures and combination drugs. Just to identify the list of NAMES of drugs we wanted to deal with was iterative and a negotiation.

Then we decided that each lab would work independently, each lab would have at least two members of the lab working on the same problem independently. We would have both intra-lab and inter-lab comparisons. We decided to start on a set of 10 drug names, using the GENERIC name as the name to work from. I started my part of the work just before I had to give a presentation at the EBI last week and was able to gather a lot of the data before the talk.

Starting with a chemical name how to you determine what the “correct” structure for that drug is. Think it’s easy? Try it! Where would you look? How would you confirm? What would the iterative loop look like in order for YOU to assert the chemical structure(s) for the drug “Vytorin”.

For me the process looks something like this. Use a level of “trust and experience” with previously used resources as a starting point and declare “This is the structure of X based on searching on the drug name for X”. Now, cross-reference, iterate, reiterate, find consistencies and collisions in order to come up with a final assertion, a list of consistent structures and the associated sources, and a set of other resources with inconsistent structures and a list of why they differ. Where possible, and if necessary, make edits to change the information (e.g. ChemSpider and Wikipedia). You can see an example of this for Vitamin K in the talk I gave at EBI here. For ten structures I came up with a number of observations for a number of drugs. The screenshot below summarizes some of the results (Click on the image to see the detail).

Represented in the table is the following information, true at the time of the search and may be already out of date

1) A search for thalidomide in ChEBI gives no hits

2) The structures of Zocor and Crestor on Drugbank are incorrect

3) There are no hits for Voglibose and Crestor on Common Chemistry

4) There were 3 incorrect structures on ChemSpider (now edited of course)

5) For most searches on a drug name on PubChem there are MULTIPLE hits and, for the set examined, the correct structure is in the results set. For example, there are 44 structures of Taxol retrieved with the search and the one I assert to be correct is there.

6) There were two incorrect structures on Wikipedia and one drug without an associated structure.

When I started the work I had a “trust” level for a number of the databases. My basic position at that time was as follows. I could rarely find the correct structure for a drug based on a text-based search of PubChem. I would generally find a set of hits and it was a lot of work to determine what was correct. Common Chemistry was excellent…but limited. Dailymed was generally good but structure representations could be abysmal.  ChEBI, DrugBank, ChemIDPlus and Wikipedia were generally VERY good. Of all of the sources I used, despite the rich data on PubChem, I struggled most with this resource to find the correct structure. The results started to show that my trust perceptions were being challenged.

In parallel with the work to prepare this small dataset for the presentation at EBI I decided that it was appropriate to ask the community for their views on some of the databases I was looking at in this work…specifically asking for how much they “trusted” a resource. Trust means different things to different people. The word, and the question I was asking in the survey, would be interpreted in different ways. But that’s the way we work…so why fight it? The survey is online here…and if you haven’t filled it in PLEASE DO!

The answers to date, from the 46 responders, are below (Click on the image to see the detail):

There are some very interesting results in here…and, I willingly admit, some I find VERY surprising. 1 person has no experience with Wikipedia? Wow. The majority of people have not heard of PDSP, ChemIDPLus, DailyMed or DrugBank…without knowing who the people are that are providing feedback of course I should not be shocked. Most of these resources are not for chemists per se but for Life Scientists. The number of votes for “Always Trust” for ChemSpider and PubChem are very high, and one might say, are a compliment. The results are clearly ChemSpider-biased since I asked the question to my social network. The difference between the people who Always Trust PubChem and Commonly Trust PubChem is one person only. This is wildly different from my own views. I have heard people say that PubChem is the equivalent of quality to CAS except it’s free. Sorry folks…afraid not! (I have since heard at the EBI meeting from one of the people from PubChem that it is possible to do searches in certain ways to limit hits. It should be noted that this does not guarantee that the correct structure is retrieved.) On the flip side to this the distribution of people rarely trusting PubChem is also quite high so someone has had some interesting experiences!

There are a small number of people who NEVER Trust the resources, and early on one person declared they trusted none of them. I trusted myself to tell a colleague…that will be “Egon Willighagen” and this was later confirmed in his blog “Trust has No Place in Science“. That may be true, and the topic of a separate post, but my judgment is pretty good!

How would I fill in the questionnaire. I would NEVER flag “Always Trust” for any of the databases. I would be able to rank order the databases in terms of my perceptions/experience and extracted trust for the quality of results I would find. The answers WOULD be different before I had conducted the work on the first 10 drugs compared with now, after the pre-work.

As the host of ChemSpider I would prefer that no one “Always Trusts” the resource as that will stop people from taking care with the data and thereby removing the possible value of them curating the data. However I am more than happy to have it Commonly Trusted and we have been working hard to gain the community’s trust in this area.

This work has triggered a number of responses….I’ll make my own comments on their positions separately… but their opinions are worth reading:

Egon Willighagen: Truth has no place in Science

Egon Willighagen: Truth has no place in Science Part 2

Christina Pikas: The role of trust in science Christina has a comment “I think that Anthony (sp.) could have chosen a better word than trust in his survey. “which of these have you evaluated and decided you could use? which of these would you prefer to use based on your evaluations of their merit?” Christina is right..I could have chosen a different word but I judge (chosen carefully!) that the responses would not have differed much.

There is also a healthy exchange happening on Friendfeed.

This work has only just started. An examination of >150 “small molecule drugs” by three labs is going to provide a lot of data. The work isn’t over and we have much to do. We’re learning a lot in the process about assertional loops, iterative process, collaboration and agreement. It’s a great adventure.

 
10 Comments

Posted by on December 11, 2010 in Community Building, Data Quality

 

Tags: , , , , , ,

The Curation of Almost 5000 Structures on Wikipedia

I recently commented on the statement made by Eric Shively of CAS about the CAS Validation Project going on at Wikipedia. The basic premise of the work is the need to validate CAS numbers to ensure that the CAS numbers listed in a chemical box are associated with the appropriate structure shown in the chemical box. So, if the structure has stereochemistry make sure that the CAS number is for the form of the structure with stereochemistry. If the CAS Number is for a neutral compound then the structure displayed should not be the salt. And so on, and so on. There are many sources of CAS Numbers online. In fact there are many places to search for them to confirm. Type in “CAS Number search” online and you’ll find a lot of hits, though admittedly not all of them related to Chemical Abstracts Services.

Some examples on “online CAS number searches” are excellent. In the order that I see them in my search:

The NIST Webbook – much loved by many scientists and very useful.

ChemIndustry – An excellent resource for chemists and gaining a good following in the market I believe

ChemFinder – Cambridgesoft’s online search system

A Buyers Guide – A German Chemical Buyer’s Guide.

PennSylvania Department of Environmental Protection

California Department of Pesticide regulation

And on and on. There are likely legal reasons for a number of these databases to have CAS Numbers. As I continued to peruse the list I was more than impressed by the number of databases serving up CAS numbers online, and, I believe, a number of them containing over 10,000 numbers which, as I have commented before, is rather a magic number. Should Wikipedia be concerned about the 10,000 CAS number issue with some of the other issues being discussed now?

PMR recently commented on my blogpost here. He said “PMR: Wikipedia has between 1000 and 2000 chemical substances (ca 0.01% of the total number of substances in CAS).”

The number of chemical substances in Wikipedia is actually MUCH higher than that…I know since I’ve been looking at them, in detail as described here. To clarify, I am building an SDF file from the chemicals on Wikipedia so that it can be deposited on ChemSpider hooked up back to Wikipedia. This was done earlier by linking up chemical names but it was far from perfect so we are doing it in this more “curated” manner. The outcome from the work, and thanks to multiple other sets of eyes from WP:CHEM, will be a curated SDF file. I will return the SDF file with the following fields generated: SMILES string, Systematic Name, InChIString, InChIKey. These can then be used to homogenize the available fields in the Chemical Boxes etc.

In doing the work (I have already worked through the whole alphabet) I have over 4900 compounds already curated at a first level. I have disregarded the majority of inorganics and organometallics for this pass. ca. 5000 organics manually curated is ENOUGH of a challenge. I estimate the number of chemical compounds to be about 6500-7000, and it’s growing. So, it’s about a factor of 3-4 times bigger than PMR’s estimate. The vast majority do have CAS numbers. While it hasn’t hit 10,000 yet… it’ coming.

 
4 Comments

Posted by on March 8, 2008 in Uncategorized

 

Tags: , ,

Taking a Break From Wikipedia Curation

I blogged previously about curating Wikipedia chemistry pages…specifically a focus on chemical structures and the quality of systematic names, trade names, structure images and outlinks to other site. This project has moved quite well….a LOT of eyeballing into the early hours. I am taking a break to catch up with some other work for the next couple of weeks (at least). As it is I have made my first pass to the letter P (having done X,Y,Z) already. There are 1100 links left for me to review – links to pages that I need to click on, open up, see if it is a structure page and then curate and validate.

I think what’s been done to date has been of value to the WP:CHEM team and to the overall quality of what’s on there. I had questioned in my own head how important and valuable the effort was. Thanks to Walkerma who pointed out this facility today it is clear that the chemistry pages are getting a lot of visits…over a 100 per day in many cases. A report on my progress is posted online here.

t’s been a work of passion to this point. Now, the reality is it is just work. I am tired of looking at Wikipedia pages (no insult to WP but I have tired eyes). It will get finished, and I hope by the end of the month…I won’t be rushing it since it will impact the quality but I will be glad when it’s over 🙂

 
Leave a comment

Posted by on January 19, 2008 in Wikipedia Chemistry

 

Tags: ,

 
Stop SOPA