Category Archives: Vision

We have great hopes for ChemSpider. This blog area will be a place for us to expand on our vision and garner your feedback.

Linking ChemSpider to Patent Searches – A Collaboration with ReelTwo

For those of you watching the progress of ChemSpider since it’s initial exposure in March of this year we have been incrementally adding new features and specifically integration to other rich sources of information. We have delivered integration to multiple data sources (Click on the Data Sources checkbox under the Advanced Search for the list) as well as the integration to text-based searching of 50,000 Open Access articles via the ChemRefer service. Now we have extended the ability to include review of Patents.

In a collaboration with Reel Two we have provided a way to provide structure and substructure searching and access through millions of chemical structures integrated to patents on the US, European and Asian Patent Offices via their SureChem Portal. Following a search simply click through to the Detailed Results page for a particular structure and look in the Data Sources list for the word SureChem. See below as an example…note Surechem blocked in red.

Surechem Link

Clicking on any of the names in the Data Sources link launches a new Browser Window containing the links to the External Substances links as shown below.

links to Surechem Data Sources

Clicking on any of the External Links will take you to the actual patent sitting on the Patent Analysis website and identified via the Surechem query. For example, see here.

We have a number of ideas to enhance the deliver of patent information via ChemSpider but for the time-being we believe that the ChemSpider and the Reel Two SureChem integration offers a powerful means by which a chemist can navigate their way from a chemical structure to a patent. We welcome your feedback.


A Plea to Academia to Help Design a Lesson Plan Using ChemSpider

Over the past 48 hours there has been an interesting discussion on CHMINF. The discussion was around how to teach a large class of students to learn about literature searching, about structure searching, property searching etc. The tools are out there to perform such searches and to facilitate students learning about the types of resources they will need to access if and when they enter industry. The premise of the exchange was that some of the gold standard resources, while excellent, are commonly not affordable at the level necessary to train large classes of students. Below is a posting I placed back onto CHMINF. My question to you readers is as follows “Is there an academic who would like to work with me on a Lesson Plan involving ChemSpider?“. If so…contact me please.

The exchange…the > indicates the comments made by one of the commentators to the original post and I used it as the basis for my own feedback.


I wonder whether or not it might be possible to use the ChemSpider service as one of the resources for the classes? For example, relative to some of the comments made below it is possible to perform the majority of searches at – this includes structure searches, property searches,name searches as well as LITERATURE searches of open Access articles. See details below…

>1. Ability to use a chemical drawing program to insert chemical drawing in a lab report.

AJW> On ChemSpider …refer to

2. Ability to identify a compound by multiple methods, such as CAS registry number, IUPAC or CAS index name, common name.

AJW> For searching by numeric identifiers, systematic names or common names use the search page at
and review the comments made at and

>3. Ability to locate basic property information on a given compound in standard sources such as CRC Handbook, Lange’s, Merck, Dictionary of Organic Compounds, MSDS (whatever basic reference tools you have).

>AJW> I am not suggesting that ChemSpider is a reference tool as yet…but in terms of searching on basic property information use the Advanced Search at

Select the appropriate check box to perform searches by structure and substructure, via intrinsic properties, predicted properties, identifiers and data source. The nice thing about this approach is that the students will find the linkages into reference sources such as the NIST webbook, PubChem, Wikipedia and other rich sources of information

>4. Ability to locate 3-5 articles about a topic related to a specific compound.

>AJW> Use the ability to search on >50,000 Open access chemistry articles bytext. We are presently adding another 60,000 open access articles. Perform the search here:

For example, search for dithiazoles and get this results set:

>5. Ability to identify the parts of a research paper and to summarize the relevance of the paper.
>6. Ability to cite articles using a standard citation style, such a ACS.

>… An assignment might be to locate information (defined by you and the lab director) about an organic compound of interest to them – why the molecule is of interest to them, some basic properties, locate 3 current articles on the compound – summarize relevance of one article in 2-3 sentences, cite all three articles according to a preferred style.

>It’s tempting to throw every possible nuance into such an assignment, but I’d stick to basics: compounds have names and properties, and you can find current literature about compounds by searching relevant article databases.

>AJW> And bring it all together using a system like ChemSpider…and I should think a PubChem and Pubmed combination would do the same, you would be able to interrogate structures, articles, properties and even spectra (scroll to the bottom of to see an example of spectra…more examples will show up shortly).

I am VERY interested in working with someone, hopefully from this list, to potential develop a lesson plan that could be posted on ChemSpider for others to use as a skeleton to build on. If anyone has an interest in doing this please contact me directly. Thanks

Leave a comment

Posted by on August 3, 2007 in Vision


Wiki Enabling ChemSpider and an Intro to “Wikinomics”

You’ll likely have noticed recently me talking about books I read on this blog. My friends and colleagues call me a Malcolm Gladwell’s Blink for what that means. I generally get “connected” to books to read by other connectors. My most recent read, and directly connected to the future intention of ChemSpider is “Wikinomics – How Mass Collaboration Changes Everything”

For those of you who have not managed to stay up to date with some of the blog postings I want to reiterate part of the future mission for ChemSpider. Our intention is to Wiki-enable ChemSpider and allow people to add information to each and every chemical structure on the database. We have already enabled the curation of the data on ChemSpider as blogged previously (1,2,3). This is one level of community participation…check the results out here. Please keep curating..

What I like about the book Wikinomics is the the historical overview (of a VERY SHORT history) of how mass collaboration has affected the way we share information, how collaboration has impacted the world of software development (and how corporations are benefiting from the effort) and how technology in the time of Web 2.0 allows news to spread quickly and help people. This collaboration has given a lot of benefits to research, pharmaceuticals, and even to the world of mining, a fascinating story. I preferred the first half of the book and it did get a little preachy and repetitive but nevertheless is a great read. By the way…the author is Anthony D. Williams….no relation to the author of this blog…that would be Antony J. Williams. But hey, if people want to talk about me as a supporter of the world of Wiki then go for it…

At present the registration and structure deposition systems on ChemSpider are being completed and, fingers crossed, you’ll see it very shortly. The question then is what could mass collaboration mean in terms of extending the information associated with the chemical structures on the database. Imagine the addition of reaction details, images, connections to other websites, etc. Just imagine…and we’ll see if we can make it happen.

Leave a comment

Posted by on July 18, 2007 in Vision


SeaDragon, Photosynth and a Mind-Blowing Presentation

I get a lot of “stuff” sent to me in a day. Other than the usual >150 work emails the inbox is peppered with absurd photos, chain letters, a fraction of spam now the Bayes filter is trained and, once in awhile, something that is truly visionary in nature.

I was blown away by this example of the potential of the semantic web for knitting together images using two technologies – Seadragon and Photosynth. Watch this presentation..take the 7 minutes. Shoot me for saying it…and I know many who criticize Microsoft every chance they can to focus on the world of Open Source, but I’m one of those who believe that Microsoft and Open Source can absolutely co-exist. When I look at what came out of Microsoft Live Labs here with Photosynth all I can say is a big Hoo-haa….this is great stuff! Check out the blog too…keep watching…

Leave a comment

Posted by on June 26, 2007 in Vision


ChemSpider Moves Further Towards Web 2.0

Ah…the cathartic nature of being back on the blog…family vacations and work travel are very distracting….

I’ve blogged previously about the question “What is Web 2.0“. In the list of “what it takes to be Web 2.0 according to Wikipedia I noted that one of the criteria is “A rich, interactive, user-friendly interface based on Ajax or similar frameworks.”

If you’ve been using ChemSpider in the past couple of days you will notice at the Search Screen and the Services screen an improvement in usability. Why? Ajax. With literally a couple of hours of work these screens were ajaxified (if it doesn’t exist I’m using it in scrabble and demanding it gets included into Websters!) and the flow of using the screen improved significantly as ChemSpider took on more of a “desktop feel”. it feels good to have made one more step towards delivering “Web 2.0 compatability”. Truly the excitement is more on the development of the social networking system under development now – when completed it will extend the curation aspects of the database and specifically allow users to add their own data into the system. This should be unveiled in its first state within the month..hopefully sooner.

Back to Ajax and a new feature, “ChemSpider Suggest”. For those of you using the system by typing in a text string to locate a record we have noticed that spelling errors abound. Now, something of these are subtle…asprin instead of aspirin (phonetically correct some would say) while others are dramatic differences mostly driven by linguistic differences….when your first language is not English all spellings are phonetic in nature…and the phonetic result is based on how you pronounce things in your language. A tough situation to deal with. With ChemSpider suggest you can start typing the first few letters of the word you are interested in searching for and it will give you a list of potentials as shown below. Imagine not being sure how to spell prostaglandin or erythromycin…such a tool dramatically helps find the right word to search and the chances of ChemSpider finding what you’re interested in. There are two parameters we can tune at present and we’d like your input – the number of letters to type before a suggestion shows up and the number of rows to suggest. Let us know your thoughts on the blog or directly at Enjoy!

ChemSpider Suggest

Leave a comment

Posted by on June 23, 2007 in ChemSpider Services, Vision


An Inquiry Regarding How is ChemSpider Used …and is it Providing Value?

For those of you frequenting this blog you will have seen a number of comments regarding the suggested failures of ChemSpider…many of these have been pointed at either inorganic compounds or organometallic complexes. Each of these issues has been addressed on this blog in detail.

What I am interested in hearing about is the other side of the coin. How are you using ChemSpider? How are you deriving value from it? Are you focused more on the searching aspects of ChemSpider or all of the services we provide? Is the speed of searching sufficient for your needs? Do you prefer to use the Browser add-ins, the ChemSketch integration or the structure drawing applet on the site? Are you curating data….if not, why not?

ChemSpider presently has >600 people per day on average using the site and this is growing. Judging by the successes we see in the transaction log shown below (one page of MANY) users are getting value. So, let us, and others, know whether ChemSpider is living up to your expectations! If not, what can we do to improve it?

Leave a comment

Posted by on May 13, 2007 in Vision


ChemSpider as a part of Web 2.0 – and what is that Web 2.0 anyways?

In this blog I am going to excerpt from another blog (and bolded to identify) regarding ChemSpider (based on my previous post it’s the way of the blogosphere) and it’s non Web 2.0 status since pages from the ChemSpider blog are being excerpted in the same way.

The question I posted for ChemSpider bloggers was whether or not the curation of data should be supported by the community. Whatever the answer should be the data is that curation is already underway and continues. Here I share comments posted elsewhere with parts of the material extracted for discussion.

To the question “Should the curation of data on ChemSpider be supported by the community?” the comments made were

…only if the community has time on its hands and wants to donate significant goods to commercial organisation(s) who will then own and control the content. (People already do this, of course – they are called scientists and as authors they donate their goods to commercial publishers. ) Put simply Chemspider is Web 1.0; The chemical blogosphere, Pubchem, Blue Obelisk, CrystalEye is Web 2.0. Chemspider’s business model was fine for the early web. No public content, significant effort to extract it, few alternative sites.”

So, some comments.

Yes, we scientists do donate our goods to commercial publishers. This past 12 months I’ve been author/co-author of almost a dozen peer-reviewed publications to some of the top journals in the world for some of the top publishers (ACS, Wiley, Elsevier for example). Some of the review processes have been slower than hoped and I do take issue to situations when editors receive two “Publish as is” and hold it up for months for one reviewer who comments “It’s too long.” The articles, when published have been exposed to many people and resulted in follow up from many scientists. I like the results, feel that the publishers do a stellar job of creating quality output and a generally seamless process. I’m not going to comment on profit margins for the publishers…you can find those rants elsewhere. To the contrary publications we have put to Open Access journals have produced no interest..yet the work was of similar caliber. The time of Open Access Journal exposure is here though and there will be increasing interest I judge. I believe ChemSpider will help this and will expose why in a later blog.

Web 2.0. I’ve asked people what it is and they generally all point to “community web”. Asked for examples they talk about reviews on Amazon, voting on Ebay, Flickr, YouTube, blogs, Wikipedia and so on. I’m sure you can add a few of your own “Web 2.0 definitions”. The general feeling is that Web 2.0 is about building community.

From the comments above about “The chemical blogosphere, Pubchem, Blue Obelisk, CrystalEye is Web 2.0” I have to assume that the intent here is to identify Web 2.0 as being connected to Open Source, downloadable content and integration.

With MySpace, YouTube and Flickr as the poster children of Web 2.0 I’m not sure how this matches up this intent. Certainly these sites are big business. They are not Open Source to the best of my knowledge. Downloadable content…I don’t think it’s possible to download the database. But these sites are major contributors to community building on Web 2.0.

I turned to Wikipedia for a more formal definition and extract below. From Wikipedia the definition of Web 2.0 is given as:

1) The transition of web sites from isolated information silos to sources of content and functionality, thus becoming computing platforms serving web applications to end-users
2) A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use, and “the market as a conversation”
3) Enhanced organization and categorization of content, emphasizing deep linking

Relative to these definitions ChemSpider delivers. Specifically

1) ChemSpider INTEGRATES information silos. We connect containers of content via the indexed chemical structures and associated identifiers leading into the silos of information. It is this integration that has encouraged data providers to look favorably on our activities. A search of ChemSpider leads scientists to their content and we do NOT replicate it except at the chemical structure and link level. We serve web applications to end-users…visit our services page
2) We are becoming a social environment for chemists…and we have only just started. 6 weeks into our beta release we have openly communicated our intentions and continue this pattern. The decentralization of authority will come as we allow peer-reviewed curation of the data. This is NOT complete at the site yet. As declared previously we will enable a wiki like environment for chemists to contribute and edit to the database. The freedom to share and re-use will be enabled shortly – the level at which this will happen is under discussion. For many it will suffice, for some it will likely be a cause for discussion.
3) In our opinion we are enhancing the organization of data and enables deep linking to an individual structure, for example the 10 millionth structure is labeled as Click on any structure on the Spinneret webzine as an example.

Also extracted from the Wikipedia article: in the opening talk of the first Web 2.0 conference, Tim O’Reilly and John Battelle summarized what they saw as key principles of Web 2.0 applications. Some are excerpted below:

1) the web as a platform
2) data as the driving force
3) network effects created by an architecture of participation
4) innovation in assembly of systems and sites composed by pulling together features from distributed, independent developers (a kind of “open source” development)
5) lightweight business models enabled by content and service syndication
6) the end of the software adoption cycle (“the perpetual beta”)
7) software above the level of a single device leveraging the power of The Long Tail.
8.) ease of picking-up by early adopters

With these definitions we believe ChemSpider delivers on many of these also. We certainly LIVE number 6 above.

From Wikipedia again “While interested parties continue to debate the definition of a Web 2.0 application, a Web 2.0 web-site may exhibit some basic common characteristics. These might include:
1) “Network as platform” — delivering (and allowing users to use) applications entirely through a browser. See also Web operating system.
2) Users owning the data on the site and exercising control over that data.
3) An architecture of participation and democracy that encourages users to add value to the application as they use it. This stands in sharp contrast to hierarchical access-control in applications, in which systems categorize users into roles with varying levels of functionality.
4) A rich, interactive, user-friendly interface based on Ajax or similar frameworks.
5) Some social-networking aspects.
6) Enhanced graphical interfaces such as gradients and rounded corners (absent in the so-called Web 1.0 era). “

Again, we deliver on the majority of these at present. Relative to 2) we do NOT have permission from our collaborators to hand over their data. Please don’t thrash us over their decisions to contribute and not share. Relative to 4) Ajax is NOT yet implemented at the site..but will be…watch this space. Relative to 6)…we have ROUNDED CORNERS…ooohhhh.

Other comments include “I see very little difference between Chemfinder and Chemspider. They are both closed, proprietary, do not expose data, or metadata, or algorithms; have closed code, do not allow downloads or re-use. They lose metadata in their aggregation process. I have nothing personal against Chemspider (or, if they are associated, ACDLabs) – I just think the Web 1.0 model is out of date for chemistry.”

To respond…yes, the code is proprietary and closed..we don’t know of any Open Source code that would quickly search >10 million structures by structure and substructure (that will be covered in a separate blog as I have the utmost respect for the commercial entities that do this well! It’s DIFFICULT.) Oh…but Open Source isn’t part of the Web 2.0 definition. We don’t expose algorithms…correct…many are provided by collaborators and we do not have the right to expose their code. But that isn’t part of Web 2.0 either.

And next…the beloved “metadata” term. What exactly IS metadata? Let’s refer again to our web-friendly Wikipedia regarding metadata. In brief it’s “data about data” and a perfect example is an XML schema vs XML. An XML schema is metadata. According to my interpretation this means InChI and SMILES are not metadata since these data can be interchanged with the structure itself. I may be wrong. The hypothetical entity describing what data can be bound to a structure would be metadata not necessarily data related somehow to the structure, but rather more general data describing the datamodel – for example the source of the data – this IS metadata. ChemSpider doesn’t lose the metadata…we retain the only metadata currently available, the data source, and use it as our link out to the provider. Our primary role again, for now, is to connect silos of information via chemical structures.

In a related vein ChemSpider just published data to PubChem and the same occurred – metadata was purposely removed. Regardless of what is uploaded to PubChem in the SDF files all except a very small number of data fields are removed and then the structure record is filled with properties calculated by commercial software – CACTVS and OpenEye. By the definition of “losing data in the aggregation process” PubChem is part of the Web 1.0 model. It’s no issue for us…we’re proud to be working under the same model as both efforts provide value. If there is interest we can certainly publish our datamodel and likely will in the near future when we submit a publication about ChemSpider.

For details about PubChem’s CACTVS (see slide 28 of this presentation) and OpenEye (see Richard Apodaca’s comments on this – I quote “Why did PubChem, the granddaddy of all open chemistry databases, choose a closed, proprietary toolkit for its software infrastructure?”).

Continuing with the comments…“99% of Chemspider’s data appears to come from Pubchem. If so, surely it is better to curate Pubchem directly. There are mechanisms for this and as Pubchem is effectively the normalised source it gives less problems for maintenance. “

Yes, what is posted on the beta version is primarily PubChem. 96% of it. This was made clear at release of beta. It’s the largest publicly available database. However, at this point we have over 7 million structures to index and deduplicate. That will reduce the contribution of PubChem significantly…but PubChem IS growing daily (we just contributed data) so we also need to download the new data. Our estimates are that by the end of June PubChem will be about 60% of the ChemSpider database.

The conclusion of the post was “They will own the results and the results will not be made Openly available but served through their gateway. You are invited to contribute. The Web 2.0 community will use a different mechanism.”

Yes, we will own the results. But we have committed to return all data provided by public sources to the providers so that entities such as PubChem can update and provide to the community. We will also provide feedback to all contributors. It is their choice to provide access.

I have to wonder why PubChem is being declared as Web 2.0…I don’t care, I just have to wonder. PubChem certainly has downloadable content…it is an incredible data collection. Integration is clear…it’s excellent. But what else makes it Web 2.0? It’s not Open Source to the best of my knowledge – it uses the components of CACTVS and OpenEye, both commercial concerns as far as I know. (This makes the PubChem just as dependent as other concerns on the longevity of the providers by the way). The social environment is where? I’m not aware that Ajax is on PubChem, at least not yet. It has been stated that the PubChem Sketcher is Ajax but an email exchange with the author suggests otherwise. So, is it provision of downloadable data that makes it Web 2.0? They DO have some rounded corners!

I note that eMolecules declares itself as Web 2.0 at its home page…”eMolecules is bringing the power of Web 2.0 to chemists around the world.” Clearly they have identified what Web 2.0 is. But I don’t see it. How? Where is the community building? That said they also comment that they are “the world’s largest repository of publicly accessible chemistry information” but I believe that particular accolade belongs to PubChem at present. Oh…major faux pas…NO rounded corners! It’s a public blog so maybe they will tell us? Maybe a tickle of Ajax on the site?

We are not declaring ChemSpider as Web 2.0..though it seems generally compatible based on the definitions…I’ll go more for Web 1.72.

We’re very sensitive to a statement made on the Wikipedia article “…when a website is proclaimed “Web 2.0” for the use of some trivial feature (such as blogs or gradient-boxes) observers may generally consider it more an attempt at promotion than an actual endorsement of the ideas behind Web 2.0. “Web 2.0” in such circumstances has sometimes sunk simply to the status of a marketing buzzword“. Yup, I can see that happening.

We’re busy delivering a functional system..we’ll let the community judge us on our compatibility as we creep from Web 1.72 to Web 2.0. For some reason I think this particular blog posting will be judged, again. I just hope this time it isn’t copied and posted elsewhere. I think we’re saying important things here too!

One comment from the Wikipedia definition of Web 2.0 that resonates with me is: “The impossibility of excluding group-members who don’t contribute to the provision of goods from sharing profits gives rise to the possibility that rational members will prefer to withhold their contribution of effort and free-ride on the contribution of others.” It would be a shame if when people see issues on ChemSpider regarding performance or content that they not curate the data for others to benefit from or at least direct their comments to us directly for us to resolve.

Readers…ChemSpider is still in beta and will be for the foreseeable future (That’s so Web 2.0…remember that Tim O’Reilly commented it to include ”the end of the software adoption cycle – “the perpetual beta”). Differently than some we chose to “go big or go home” and went live with the beta…and then got pounded, not once but twice. A great introduction to the power of the blogosphere and the catalyst to putting up our own blog.

We got into this discussion about Web 2.0 as a result of the question “Should the curation of data on ChemSpider be supported by the community?. “Whatever the answer the reality is that curation is already underway and continues unabated. Thanks all.

By the way we did try to validate ourselves against the Web 2.0 Validator…we didn’t score very well (9/66) but it did say we were Web 3.0 compliant! WWMM, PubChem and eMolecules all got 4/66 We’re glad to be 9/66 but read the full story first….There’s fun stuff out there…

Leave a comment

Posted by on May 10, 2007 in Vision


Who is ChemZoo relative to ChemSpider? Will it stay free?

A recent blogger identified ChemSpider as being a commercial entity. I’d like to clear that up. Yes, it is. It’s a company….a corporation in fact. Why? Because we have wives and families to take care of. We live in a litigious society and we are protecting our families.  We know that many commercial organizations will not take kindly to what we are trying to do with ChemSpider. In fact, as will be evidenced by some blogs to be posted here in the next few days, some members of the Open Source community don’t seem to support what we’re trying to do.


ChemZoo is a corporation. ChemSpider is free. FREE. There are no charges to use any services. ChemSpider is self-financed. All hardware, software and costs associated with running ChemSpider come from the bank accounts of the ChemSpider team. No venture capital. No sponsorship. No Google Adwords. Will it stay this way?


Fact: ChemSpider is getting hundreds of unique visitors per day. Actually it went over 1000 visitors per day based on a posting at another blog.


Fact: It is performing thousands of transactions and searches per day. In less than one month we have received blessings from a number of users and ire from others. But no press is bad press and our “Google Exposure” has us rank at over 110,000 hits as of  today (some do use this as a measure of success)


Fact: We opened ChemSpider based on the PubChem data source. This was declared on the website the day we went live. In less than one month we have taken delivery of close to a million new structures to post on ChemSpider. We do not yet know how many will be unique until de-duplicated. Some of these collaborators have required that we sign contracts …yes, ChemZoo is indeed a business.


So, will ChemSpider always be free? The intention is a resounding yes. However, it is clear we may have to seek sponsorship. Our data collection is growing quickly. We WILL have to add an additional server shortly. No choice really if we don’t want to disappoint the chemistry community. We have had companies approach us about integrating paid services to ChemSpider. That might happen. But, the basic capabilities of ChemSpider, and the layering of tools to encourage the growth of a chemical community around a structure centric database, will remain free for everyone. Unless there is a need to charge to help pay off legal bills in this, our litigious society.

Leave a comment

Posted by on April 27, 2007 in Vision


Welcome to the new Blog

We have launched the ChemSpider Blog for a number of reasons. Specifically

1) We believe it gives us a good opportunity to discuss our vision of where we are going and some of the struggles to get there. Speaking openly and honestly about the passion we have for ChemSpider is a great opportunity for us to express our creativity and post for comments and feedback. 

2) We do not have clear decisions made in certain areas and we will encourage the feedback of the chemistry community to help in these decisions

3) Blogging is certainly a high profile manner to engage an audience and express, warts and all, challenges and opportunities. For an example of an advantage of honesty read Wired ‘s Naked CEO article

4) It offers a path to respond to other blogs. Experience shows that responses to criticism on other blogs is edited prior to posting. We may have to take advantage of that capability ourselves of course!

5) It’s fun.

Leave a comment

Posted by on in Vision