Some of you might be keeping your eye on our partner blog hosted by Will Griffiths, the Open Chemistry Web. If you are then you will be aware that we have reached an agreement with the Royal Society of Chemistry as described here. The ChemSpider is all about building community. The Chemistry community is not just chemists – it is publishers, policy-makers, vendors, academics, and corporations etc. Our intention is to co-exist within the community and so we navigate the challenges as best we can, always knowing that we might get the odd slap on the hand. My judgment is that if this happens (and it has) conversation and a modicum of emotional intelligence can keep us in relationship with the community and get us to mutual agreement. I can comment we’ve done this already in a couple of situations.
With this as a lead-in we are presently working through a potential three-way relationship issue. I’ve posted previously about whether people would be interested in seeing us connect to CrystalEye. The response both on and off blog suggests we should do it so I initiated a conversation with PMR and have copied the comments below. The list of journals presently indexed is given here. You’ll quickly see the issue regarding three-way relationships.
Peter, I asked previously about how to obtain an SDF file of the structures on CrystalEye so that we could link to CrystalEye records via ChemSPider. This was based on my question to the community at
http://www.chemspider.com/blog/?p=191
Your comment was that the data was Open but that an SDF was not available and we should scrape the data. I was looking at this possibility today. I was pleasantly surprised to see a number of the journals listed included ACS journals and Elsevier journals (http://wwmm.ch.cam.ac.uk/crystaleye/summary/index.html). There has been a lot of traffic of late about their Open Access policies but now I see that they are supporting your Open Data efforts. This is excellent. I would like confirmation that they are aware of the Open Data posted from their journals before we scrape them. Are they aware? I want to make sure I am respecting all parties. Thanks
- pm286 Says:
October 26th, 2007 at 7:54 am
(1) All data come from Free sources – i.e. visible without a subscription. Some journals (Acta Crystallographica and RSC for example) do not copyright the data. Others like ACS add copyright notices. It is our contention, and Elsevier has agreed for its own material, that facts are not copyrightable. We have therefore extracted and transformed facts and mounted these. Where the original material (CIF) does not carry copyright we mount it on our pages – where it does we do not, but we have the transformed data. In those cases it would be possible to recreate the original CIF data in semantic form ,but not the exact typographical layout which contains meaningless whitespace.
I am not aware that ACS or Elsevier have ever made statements of any kind about our Open Data efforts.
You may scrape anything, must you must honour the source and the metadata and you should add the Open Data sticker. If you scrape the link (simplest) you may simpy point to our site. If you scrape more data you should ensure that the integrity of the data is maintined and that if it is re-used the re-used data should still clearly show our metadata
Our intention is to scrape the InChIs, the title of the article, the journal name, volume and page details and the DOI number. We will de-duplicate the structures onto the database or create new structure records as appropriate. My concern is whether or not the ACS will allow us to scrape their Open Data so I have issued the direct question to them below. I am hoping for an affirmative response and then I will move on to confirm with the other publishers.
Colleagues,
I am presently considering utilizing the data from the CrystalEye online database as I have outlined here: http://www.chemspider.com/blog/?p=191
The CrystalEye database is run from the
I am seeking confirmation that if we scrape the data from the CrystalEye database and populate onto ChemSpider that we will not be breaking any copyrights. I have asked the question here to Peter: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=737#comment-62799 and he has answered. I am now seeking your confirmation that it is appropriate for me to access the data since this is marked as Open Data at Peter’s site. I welcome your comments. Thank you
The list of all Publishers is given below. If we can deposit the Open Data structures from CrystalEye into ChemSpider and link up to the articles using DOI lookup through Crossref then we will be continuing our project of making articles structure searchable. Exciting times.
- Acta Crystallographica
- American Chemical Society
- Chemical Society of Japan
- Elsevier
- Royal Society of Chemistry
0 Responses to Intention to Scrape CrystalEye Content and Staying in Relationship with Publishers