Category Archives: Vision

We have great hopes for ChemSpider. This blog area will be a place for us to expand on our vision and garner your feedback.

The involvement of RSC with PharmaSea and a new antibiotics search to focus on the sea bed

A nice article went out today on the BBC News site regarding the work that the PharmaSea project would be undertaking…to find new classes of antibiotics deep in the ocean.


The RSC is involved in the project as a result of our skills in hosting chemicals in a publicly accessible database as well as integrating data. ChemSpider also has a rich collection of natural products already in the database and we are developing approaches to segregate the collection for use by the project. We also have the RSC Natural Product Updates database that we have already integrated with ChemSpider. There are various other aspects of work that we will be doing to support the project including developing approaches to perform “dereplication” – determining whether or not a particular chemical has been previously isolated/identified/elucidated, in this case by searching the ChemSpider database using spectral features (NMR shifts, multiplicities, mass, fragment ions etc). If the actual compound itself is not identified then dereplication approaches can certainly hint at a particular chemical class and substructures. We do NOT have spectral data for the majority of compounds in ChemSpider so spectral prediction approaches will be useful in this regard. We will be working with some very skilled scientists who have experience with the structure elucidation of novel natural products and will have the opportunity to collaborate with ACD/Labs, a company I worked for for over a decade on their Computer-Assisted Structure Elucidation software program, Structure Elucidator, one of the tools that will be used in this project.

Its going to be an exciting project, I am REALLY looking forward to it and heck, if we can help identify new classes of antibiotics we might contribute to some of the challenges we have ahead of us!!!!


Leave a comment

Posted by on February 16, 2013 in Nuclear magnetic resonance, PharmaSea, Vision


More Computing on the Playstation 3 – PS3GRID

Following my recent post on high performance computing and the Cell B.E I saw this today re. Gamers handing over their compute cycles to PS3GRID.
I abstract here but point you to the full article for details:

PS3GRID is coordinated by researchers at the Research Unit on Biomedical Informatics (GRIB) at the Instituto Municipal de Investigación Médica and the Universidad Pompeu Fabra in Barcelona, Spain. The distributed infrastructure enables any PS3 to do computations on atomic and molecular simulations

The researchers, headed by GRIB scientist Gianni De Fabritiis, chose the PS3 because it is the first consumer device to contain the IBM Cell processor. “The Cell,” which is more than an order of magnitude faster than standard Intel or AMD processors, optimizes the types of computation commonly used in graphics applications. In addition, the Cell offers an inexpensive and powerful method to perform highly detailed molecular dynamics simulations of biomedical systems. Using the Cell, a PS3 has the computational power equivalent to about 20 PCs.”

Leave a comment

Posted by on November 20, 2007 in Vision


Something New and Exciting Coming Soon…

I think the image below will tell the story of what’s coming soon to ChemSpider. As part of a collaboration with a member of our advisory group we will be unveiling this new capability for beta testing in the very near future. I’m sure some of you will see where we are going next…watch this space.

Leave a comment

Posted by on November 16, 2007 in ChemSpider Services, Vision


High Performance Computing – What Accessibility to TFlops Can Offer Scientists: PlayStations and the Cell Broadband Engine

I subscribe to Scientific Computing so that it drops into my email inbox. I read Rob Farber’s article this week entitled “The Future Looks Bright for Teraflop Computing “. His opening question was “Wouldn’t it be great to have a teraflop of computing power sitting in your lab, desktop workstation, or remote instrument server?” What would that mean to your work?

For those of you using ChemSpider you will know that we have about 20 million compounds on the database. With that many compounds population of the database with properties such as InChIStrings, InChIKeys, physchem properties and systematic names can take many days if not weeks. With three computers only in our hands, one of them a web server and one of them the database server, we are limited to one system. Even that dual processor system provides slow throughput. Oh the joys of having access to teraflop processors!!!

In my previous post on focused libraries I commented on ongoing discussions regarding the potential to perform online docking. Evangelists such as Jean-Claude Bradley (on our advisory group) have been talking about this possibility as part of his approach to Open Notebook Science. Docking can be very time consuming and the speed of calculations is very important. I have been working on a project regarding the value of porting docking software to the Cell Broadband Engine processor from IBM. The development of that processor is an interesting story in itself since it was driven specifically by the needs of the gaming industry for better performance in their calculations. Now SimBioSys are porting their docking software to the Cell processor as described in this White Paper. The improvement in performance is quite amazing!!!

While working for a commercial software company we saw productivity gains moving to clusters. Dual processors in our laptops and annual performance gains from the general technology shifts offer faster calculations every year. Teraflops on the desktop (and even laptop) are likely a few years away…but GFlops are here..

Leave a comment

Posted by on November 14, 2007 in Vision


The ChemSpider Team Chooses Our Future Platform for Collaboration – Microsoft SharePoint

When we first started the ChemSpider project we made a commitment to “Build a Structure Centric Community for Chemists”. We are well on the way to facilitating that we believe. We have talked about a “wiki” environment for collaboration. In this framework we see wiki to indicate a “collaborative environment”, not necessarily adherence to a specific wiki-platform. Our intention is to provide the ability for users of ChemSpider to collaborate in the co-management of content on the ChemSpider site. A number of our readers have taken our statements to indicate that we will be using the same wiki platform as that utilized on Wikipedia. We have looked at and considered a number of “wiki” tools, platforms, interfaces and user-experiences. At this time we have made a decision to utilize Microsoft Sharepoint as the platform on which to construct our wiki-environment. With a clear commitment to Web 2.0 already declared and our platform built on SQL server and ASP.NET we feel it is the appropriate platform for us to build on. We believe the correct platform choice has already demonstrated that we can deploy a good solution very quickly because of our technology choices.

Now, we realize that this might result in a series of jabs about us not using Open Source solutions and so on but we are more focused on delivering an appropriate scalable solution than building ChemSpider only on Open Source software. We will support anyone who wishes to do the same on Open Source though.

We will keep you informed of our progress. Now we need to migrate ourselves to .NET3 and we hope this will be a short term disruption in the future as we switch over. Watch this space.

Leave a comment

Posted by on November 12, 2007 in How ChemSpider Runs, Vision


Who Gets to Choose Whether Data is Open or Not?

For those of you who have been watching the blog of late you will be aware of the recent discussions about Open Data (1,2). We have offered the possibility to submitters of spectral data to declare their data either Open or Closed. Noel posted a comment on the blog asking the question “Why is the default Closed? Why even offer the option of Closed?” response to “Why not offer the option of Closed?” My opinion is that this is the submitters decision. It’s not our role to force “Openness” of data onto users. We are working to create an environment that provides value to ChemSpider users rather than one that forces them into a policy regarding openness. Personally, I would prefer to have access to data to help answer a question, even if they are NOT Open Data, than to not have access to those data. I have asked all of the people who have submitted data or had me submit data to ChemSpider whether they would like to have their data moved to open. 3 said yes 2 said no. I do NOT intend to force people to adhere to making their data Open. That is their choice, not mine. We are creating a community for collaboration. There is value in having access to data whether it is Open or not. if you look at the recent conversations about RSC and their Free Access versus Open Access we must agree that there IS value to Free Access to their articles despite the fact that they are not Open Access.

My friend Gary Martin has allowed us to deposit some of his data onto ChemSpider. He has commented twice (1,2) and I refer you to those blog postings for his opinions. They are interesting to read.

The reality is tha our policies, even as they are, appear to be appropriate to have people deposit their data. We already have over 100 spectra deposited on ChemSpider and more to come based on recent conversations. Some of these ARE Open Data and the depositors are acknowledged for this. They are sharing their data with you through us. That’s the benefit of building a community for chemists.


Tags: , ,

Presentation at the PubChem Working Group Meeting

This week I was privileged to attend a PubChem Working Group meeting in Washington and sit around table with interested parties discussing the present and future state of PubChem. I had the opportunity to give an overview of ChemSpider and our vision of ourselves and where we are going. if you are interested in reviewing the commentary please find a PDF file of the presentation here (shared with permission of PubChem). I welcome any comments, feedback or questions either as a blog response or offline.

Leave a comment

Posted by on October 3, 2007 in Vision


Seth Godin’s Big Ideas, the InChiKey and Structure Searching the Web

Seth Godin is a mentor to many marketers out there today. I’ve read a number of his books over the years and he has many comments. He is a self-professed “idea-giver” …read his latest blog posting. I specifically like his comment “ideas are easy, doing stuff is hard”. How true that is. Over the years I’ve had lots of ideas. I’ve shared many “beverage-based conversations” where big ideas have been put out. The trick is in the “money where your mouth is” execution of these ideas. Over the years I’ve had the pleasure of working with people who tend to deliver as well as talk. WAY more motivating than just listening to the promises of what could be.

A few years ago at a meeting in Washington I sat in on probably the earliest public forum discussion on the potential of InChI. As a result of excellent teamwork between NIST and IUPAC, and doing rather than just talking they got it done. There was some negativity expressed during the initial meetings about InChI but it did not distract the team from producing the prototype versions, initial release and now the latest update with InChIKey support.

Now, I’ll guarantee that Seth Godin doesn’t know what an InChIKey is (Seth, if you’re reading this prove me wrong 🙂 ). But I want to take the position of supporting the Big Idea of structure searching the web and suggesting InChI key as one way execute on this now. There is a lot of passion around doing this and it has shown up in a number of postings by Rich, by Joerg (in regards to Wikipedia in this discussion), by Egon (discussing RDF’ing molecular space) and Jim, among others.

I am reading and hearing exchanges about the web being made structure searchable and my mind drifts immediately to the “it’s not enough” stance. The InChIKey should address some of the issues seen with InChI string searches and likely will be way more popular with the search engines. As commented last night on ChemSpider news the InChI keys on ChemSpider now link directly to a Google search.

The challenge remains, once all of those keys are out there how will the web be SUBstructure searchable or SIMILARITY searchable. The solution would appear to be a centralized repository of structures with their associated InChI strings and InChIKeys. The InChIKey cannot be reversed to the structure. A centralized repository of millions of structures and associated InChI strings and keys would allow that repository to be searched by substructure/similarity and then when a structure(s) of interest is identified then the Google search on that string/key could be kicked off. Maybe the discussion regarding the creation of such a centralized repository has happened already so I’d be interested in hearing what the path forward for that is. If it’s happening then the questions are who will host, how will it be funded, is there a timeline etc. If it’s not happening or is way in the future then I have an interest in opening the discussion regarding using the ChemSpider database and appropriate services (presently under development) to provide an interim service.

Structure searching of the web is of course going to provide high value. It should not stop there of course. let’s have the proactive dialog now about the next phase to facilitate substructure and similarity searching. If the conversations are going on elsewhere please post the links as comments so that the readers can follow them. I’m sure that Egon, Joerg, Rich, PMR will all have thoughts about how this should look. The bottom line out there is if this is the path the underlying system needs to be able to handle at least 25 million structures (ChemSpider has 17 million already) in the short term and be scalable to many tens of millions. There aren’t too many open platforms that can do that yet. I am aware of commercial platforms supporting many millions but no Open Source platforms yet…

Leave a comment

Posted by on September 13, 2007 in Vision


The question is, which project will survive…Will ChemSpider Stick Around…

Recently I posted some statistics regarding traffic to the ChemSpider website examined using various tools…our own and the Alexa Rank engine. Peter Schneider has commented on the performance of the various rank engines. He also asked an interesting question: “But the real question is: Does emolecules generate more income with an Alexa Rank of 400 000? It is not the question, if a site has more visitors or not… The question is, which project will survive…” It

s definitely worth commenting on!I am looking into the Alexa Toolbar issue and if Peter is correct in his judgment of its bias we will likely take it down. What we are looking for is accurate representation. We are now tracking google analytics and have signed up on as he suggested so only time will tell now.I think Peter is right in that there needs to be some standard way to compare sites. Certainly ChemSpider is not out to “beat” eMolecules or PubChem, or any of the new systems which might come online in the near future. I believe we all share the same space and bring value in our own ways. I have great respect for what Klaus and the group are up to. I collaborated with the team directly while I was at ACD/Labs – integrating ChemSketch into Chmoogle (as it was then), arranging exposure at Reactive Reports and then again with the logP donations working with the PhysChem product manager at ACD/Labs .Does eMolecules generate more revenue than ChemSpider with a lower Alexa rank. I would hope so…they are a business! I am not sure of their business plan but it does include exposing companies catalogs through their site (for revenue I should expect. – see example with a NCH skin on top of eMolecules engine at I have also heard that in certain cases that compounds sold via the website results in a percentage going to eMolecules. I don t know it is true but it is rumored to be that way. (By the way..I suggested to Klaus that we exchange our relevant structure collections and index each others structure collections and link between the sites but haven’t got a response yet. This type of exchange/integration is what Joerg is talking about here.)ChemSpider, on the other hand, is a passion project. Until about a month ago it was non-revenue generating …more bank account draining 🙂 All computer software, hardware, ISP fees etc were paid for out of our bank accounts. Yes, we founded a corporation to do this…we re an overly “litigious society”.Recently I chose a period of personal sabbatical so now I am the non-revenue generating member of the household (but a great chauffeur for the children). I am happy to say that now we actually have sponsors for the site. We did try the Adsense approach but the $2.50 per day wasn’t worth the reputation ding and the annoying screens. We’ve added “Buy me a Coffee” to the blogs…but so far we haven’t had one. So, we are depending on the kindness of our sponsors to keep the site going at present. If you look at the home page you will note that Waters was kind enough to sponsor the site and is a gold-level sponsor based on the magnitude of their support. We have recently received support from one of our other collaborators and their logo will post soon.

I can confirm that in my downtime I am looking for additional funding to the keep ChemSpider going in whatever way it comes: sponsorship, anonymous donations, grants, collaborations, begging, borrowing (no stealing…). ChemSpider can continue to move while there are free cycles to support it and enough income (or family monies available) to keep it exposed. If there is no way to create a revenue stream from the system it will certainly suffer in terms of the pace it moves when those of us working on it now get tired and some of us “go back to work” and have new career objectives to distract us. ChemSpider IS still a passion project. The intention is that there will always be an Open Access ChemSpider for chemists to use. I see no reason that everything you have access to now will ever be taken away. The majority of what we have in our development plans is for the good of all. I don’t know how else to commit to a deeper level of permanence for the site. We are not yet done with the conversations about Open Sourcing the code in the future.

So, thanks Peter for asking the question about “which project will survive”. If any readers have thoughts about garnering financial support for the system through sponsorship, grants, collaborative work etc please contact me at the usual address (antony.williams AT chemspider DOT com) and open the discussion. What we want is for ChemSpider to be around for many years to come..and I believe we can make that happen even in our spare time. That said, with dedicated effort the reach of this project can be truly massive…




Leave a comment

Posted by on September 12, 2007 in Vision


ChemSpider Usage Continues to Grow In a Linear Fashion

This past week I received some inquiries and comments regarding the traffic coming to the ChemSpider Site. It was commented that it was not possible to compare eMolecules traffic and ChemSpider traffic on Compete. I confirmed this and have now registered ChemSpider so that this should be possible in the future. There are many Analytics tools out there to measure traffic at a site. We use Weblog Expert at our site for our internal analytics tool. The plot below shows a fairly linear growth in the number of unique visitors to the ChemSpider site since we went live on March 27th, just in time for the Spring ACS.

WebLog Expert plot

We also use Alexa to browse our performance. The statistics are shown below for the increase in global users accessing the site, the overall traffic rank and the number of page views per user.

Alexa Rank 1

The geographical distribution of visitors is actually quite surprising. Until recently the UK was actually the most popular visiting country but the US visits increased dramatically when we integrated the announcement regarding the Patent Searching went online. What is quite surprising is the low number of visitors from Germany, China and India. Based on my previous experiences in the chemoinformatics world I would expect Germany to be much HIGHER and certainly there should be increased traffic from India. That said, India wasn’t even on the list a week ago and is growing now as the message spreads. If any of you can help spread the message outside of the USA please do!

Alexa Rank 2

Addressing the original statement about being unable to compare stats on I’ve shown the geographical traffic ranks for eMolecules. Clearly there are a lot more countries for ChemSpider to provide value to! Hopefully our penetration will increase with time.

Emolecules on Alexa

Interestingly, there are also all types of rumors about the validity of Alexa but Alexa challenge this. It’s difficult to know what’s right so what’s reported here is simply what’s given online. What we are happy to report is an ongoing growth in the usage of the system. It validates our efforts.


Posted by on September 10, 2007 in Vision