Archive Info

You are currently browsing the The ChemConnector Blog by Antony Williams weblog archives for 'Community Building' category

Presentation at the BAGIM Meeting in Boston

Tonight I gave a presentation at the BAGIM meeting in Boston. The abstract is below together with the embedded presentation from Slideshare

ChemSpider – Is This The Future of Linked Chemistry on the Internet?
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are now hundreds of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them.  Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of almost 25 million chemical substances, grows daily, and is integrated with over 400 sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for a linked web for chemistry and to provide access to a set online tools and services to support access to these data.  

American Chemical Society Loses the Appeal Against the Leadscope Case

The American Chemical Society is going to take a pretty significant hit in its most recent iteration of the ACS vs Leadscope case…to the sum total of $40 million PLUS costs. Ow.

ACS, through CAS generally, have had a number of very high profile collisions over the past few years but this has to be the most costly.

1) In 2004 ACS went up against Google for infringement against “Scholar” as a trademark. ” The ACS complaint contends that Google’s use of the word scholar infringes on ACS’s SciFinder Scholar and Scholar trademarks and constitutes unfair competition.” No one “lost” and it was settled out of court with the statement from the ACS that “The settlement includes a confidentiality clause and as such the ACS will have no further comment.” Not sure how much it cost but I don’t personally know any cheap lawyers. And if you’re up against Google lawyers they are not going to be cheap lawyers!

2) In 2005 the ACS opposed the creation of PubChem stating “The ACS believes strongly that the Federal Government should not seek to become a taxpayer supported publisher. By collecting, organizing, and disseminating small molecule information whose creation it has not funded and which duplicates CAS services, NIH has started ominously, down the path to unfettered scientific publishing…“. This one was a very public battle with a very significant public outcry. There were discussions on multiple blogs, letters to C&E News and a number of people I know personally gave up their ACS membership in disgust. Wikipedia has some interesting reports about some of the costs involved. “The ACS has a strong financial interest in the issue since the Chemical Abstracts Service generates a large percentage of the society’s revenue. To advocate their position against the PubChem database, ACS has actively lobbied the US Congress. They are reported to have paid the lobbying firm Hicks Partners LLC at least $100,000 in 2005 to try to persuade congressional members, the NIH, and the Office of Management and Budget (OMB) against establishing a publicly funded database. They also were reported to have spent $180,000 to hire Wexler & Walker Public Policy Associates to promote the ‘use of [a] commercial database.” In the same article Wikipedia reports on the ACS stance against Open Access: “The journal Nature reported that ACS had hired a public relations firm, Dezenhall Resources, to try to halt the open access movement.[6] Scientific American later reported that ACS had spent over $200,000 to hire Wexler & Walker Public Policy Association to lobby against open access”

3) In 2002 ACS sued Leadscope and for the past eight years Leadscope and the founding scientists Paul Blower, Glenn Myatt and Wayne Johnson have been battling the charge of trade secret misappropriation. The ACS claimed that the three scientists had stolen trade secrets by patenting a software program for pharma companies that shortens the process to develop new drugs. The case was finally tried in 2008 and the jury found no evidence of misappropriation. They determined that the ACS had brought its claim in bad faith and awarded Leadscope damages on their countersuit for defamation, unfair competition and tortuous interference following an eight-week trial and assigned damages of $27 million.

In closing arguments, Leadscope’s attorney argued, that ACS “destroyed the reputations of three dedicated scientists…They have ruined the financial position of LeadScope…These scientists did their own work. They didn’t take anything from [ACS]“. Much of the case focused on expert analysis of Leadscope’s source code. Leadscope presented expert testimony that the source code of their own product was NOT copied.

The C&E News report of the result is here.

ACS appealed the result of the case and has been fighting it for the past couple of years. They lost the appeal and the costs are now up to $40 million PLUS costs. Ow.

I’ve been an ACS member for well over a decade. I’ve been an RSC (Royal Society of Chemistry) employee for just over a year. All my comments are made as an ACS member and not an RSC employee…it’s why I am making the comments here and not on the ChemSpider blog.

1) Summing up the amount of money that has gone into litigation, lawyers fees and settlements how much money has been drained from the coffers of the ACS in the past decade. With the impending $40 million damages and the other legal wranglings it has to be over $50 million? Surely that money would be put to best use subsidizing a conference, keeping membership fees down or even investing it to supply materials or support to schools and colleges with needs around chemistry? How many other legal wranglings are waiting in the wings to further draw down the coffers?

2) How many not-for-profits engage themselves in such regular legal wranglings? In 1990 a lawsuit was brought against the ACS threatening the not-for-profit status. The discussions regarding ACS/CAS having not-for-profit status continues to be a talking point in a number of circles and dinner conversations I have sat in on. As Jeffrey Rich commented “CAS is in no way related to the Boy Scouts of America or the United Way. What they do is no different from what a big computer business or publishing company does. That’s not a sign of a charitable organization, but of an intellectual business organization in business to make a bundle.” Another ACS, the American Cancer Society, has similar questions hanging over it.

3) What is the reputation cost of these legal cases for the ACS? I know a number of people who have left the ACS because of the PubChem challenges made by ACS. The blogosphere lit up when these challenges were happening and yet, as far as I can tell, no efforts are being undertaken to defuse or participate in the discussions. The statements are legal only and carry only succinct statements that hardly explain the mindset behind the challenges. A town meeting allowing a dialog would be very beneficial. I look forward to sitting in on such a discussion regarding Leadscope at the next ACS. Will it happen? Was their a town meeting at ACS/CAS regarding the latest legal conclusion and how it will impact the organization?

I enjoy ACS meetings. I read C&E News every week but admit that I find RSC’s Chemistry World a more entertaining read. I know a lot of people at CAS and ACS and they are great people.

I hope that more consideration is given before the next legal case is brought against an individual or organization. It costs reputation and money and will continue the growing concern regarding ACS’s business focus rather than acting as a nonprofit.

Optical Structure Recognition, Solubility Prediction and Neutral Parties

There are a few areas of cheminformatics that I watch out of professional interest but more out of passion if the truth be known. As an NMR spectroscopist I still watch NMR processing and prediction software, CASE systems (Computer assisted structure elucidation), structure drawing and databasing, and, in regards to our recent interest over at ChemSpider regarding chemical name and structure image recognition, I watch OSR software developments. OSR is Optical Structure Recognition, the equivalent of OCR for chemical structure images. (Egon and I are both interested in OSR it seems…)

Probably the best known OSR system on the market for the past few years is CLiDE and I have had a chance to work with it as discussed here. There are now others available on the market though specifically ChemOCR from the Fraunhofer Institute. There is also OSRA from the National Cancer Institute and ChemReader from the University of Michigan. I can’t find it now but there was also Kekule, also funded by the NCI.

As with all software focusing on a particular problem the intention for these packages is the same but the technology approaches are different. These software packages all have similar intentions…convert structure images into machine readable chemical structure formats. The technology approaches are similar but differ of course in their implementation. This blog isn’t about those differences, it is about how can they be compared?

Recently a gauntlet was thrown down in regards to solubility prediction. The question asked was “Can You Predict Solubilities of Thirty-Two Molecules Using a Database of One Hundred Reliable Measurements? “. The details of the challenge are here. What was nice about this is the fact that the results could be judged by independent parties. What was objective, at least from where I’m sitting, is that experts in the field got to review the data and comment. This is very different from chemistry software vendors comparing each others products and standing with their own opinions. I’ve been involved with this myself in terms of NMR prediction comparisons and these discussions can get rather heated. There was similar “warmth” in the air about a year ago in the OSR domain as discussed here.

So, with so many efforts in the area of OSR how can we get independent testing of multiple OSR packages and get a true representation of the performance characteristics of these packages? Since some packages are commercial while others are Open Source we would need to separate the distinctions of “packaging” from performance. A set of objective criteria separating usability, workflows and interface from algorithms. This doesn’t mean that the former are not important, nay critical to the success of a software package BUT the algorithms, the science, the technology should be the focus of the study.

I suggest taking 100-200 images from different sources and applying the various software packages to validate performance in a neutral way. The study should be conducted by neutral parties…not so neutral that they don’t care about the work but neutral in a way that they are implicitly wed to the outcome of an objective comparison of the OSR algorithms. I have an interest in this so will throw my hat in the ring…I have already done some work on CLiDE and OSRA (1, 2, 3, 4). WHo else would be interested?

The challenges…there are a few:

1) Would all of the OSR producers share their software packages with a neutral panel of reviewers?

2) Who would fund the work? The Solubility challenge appears to have been funded by Pfizer. What immediacy would it be done with without funding…everyone’s busy.

3) How would the panel be selected?

4) Would the work be conducted without all OSR producers participating?

5) About a dozen more concerns….probably Jonathan Goodman, Robert Glen and John Mitchell could give some great advice based on their experience with the Solubility Challenge.

I think this type of comparison needs doing…you?

Chem4Word Project from Microsoft and Murray-Rust

Following on from my presentation regarding text-mining and document mark-up at the ACS meeting in Philly it was interesting to see the announcement about the Chem4Word project from Microsoft. In collaboration with the Unilever School of Informatics at Cambridge university, and specifically working with Peter Murray-Rust and some of his team. From the website announcement it states:  “Microsoft Research is investigating the introduction of chemistry-related features in Microsoft Office Word, including authoring and semantic annotations. Our approach to chemistry authoring will be modeled after the mathematic equation authoring in Word 2007 and will leverage many of the user-interface and XML extensibility options that are provided by Office 2007.

The goal of the Chem4Word project is to enable similar authoring, display, and mining scenarios for chemistry-related information within Office Word. Specifically, we aim to:

  • Provide easy authoring of chemical information within Microsoft Office Word 2007 documents
  • Allow end-user denotation of inline “chemical zones”
  • Render high-quality, print-ready visual depictions of chemical structures
  • Store and expose chemical information in a semantically rich manner to support publishing and mining scenarios, for authors, readers, publishers, and other vendors across the broad chemical information community”

This will be very useful in terms of supporting our efforts to enable the publication process for chemists and we will be watching this project with interest and hope to be engaged in early testing if we are invited.

The Network of Antony Williams ChemSpiderman

As with most people in the blogosphere I am happily dabbling with different social networking tools and specifically those enabling the scientific community. I have a LinkedIn profile, am starting to participate on ResearchGate and SciLink, as well as others. Tonight I looked at BiomedExperts and created my profile. It’s a very easy to use site, it was simple to bring together my publication history from the past few years and I enjoyed the visualization tools enabling me to see m network (an example shown below). Check it out..

Invited Symposium Speaker at a Fortune 500 Company

I’m excited to speak next week at a “by invitation only” symposium at one of the top Fortune 500 Companies. The focus of the gathering for the 350 attendees will be “Networks” and I will be speaking about  “Crowd-sourcing to Build A Structure-centric Community for Chemists”. I will of course talk about ChemSpider but also about my experiences with Wikipedia Chemistry and other general and scientific networks I have become involved with over the years. I will be speaking alongside invited speakers from organizations such as Yahoo, MIT, General Electric, Brookhaven, Harvard University etc so I am quite humbled not only by the invitation  but also by the chance to network (appropriate for a gathering about “networks”) with such a diverse group of people. I’m not sure what the situation is regarding releasing the presentation publicly after the gathering but will do so following discussions with the organizers. I’m sure it will be acceptable.