RSS

Intention to Scrape CrystalEye Content and Staying in Relationship with Publishers

26 Oct

Some of you might be keeping your eye on our partner blog hosted by Will Griffiths, the Open Chemistry Web. If you are then you will be aware that we have reached an agreement with the Royal Society of Chemistry as described here. The ChemSpider is all about building community. The Chemistry community is not just chemists – it is publishers, policy-makers, vendors, academics, and corporations etc. Our intention is to co-exist within the community and so we navigate the challenges as best we can, always knowing that we might get the odd slap on the hand. My judgment is that if this happens (and it has) conversation and a modicum of emotional intelligence can keep us in relationship with the community and get us to mutual agreement. I can comment we’ve done this already in a couple of situations.

With this as a lead-in we are presently working through a potential three-way relationship issue. I’ve posted previously about whether people would be interested in seeing us connect to CrystalEye. The response both on and off blog suggests we should do it so I initiated a conversation with PMR and have copied the comments below. The list of journals presently indexed is given here. You’ll quickly see the issue regarding three-way relationships.

  1. ChemSpiderMan Says:
    October 26th, 2007 at 12:01 am

Peter, I asked previously about how to obtain an SDF file of the structures on CrystalEye so that we could link to CrystalEye records via ChemSPider. This was based on my question to the community at

http://www.chemspider.com/blog/?p=191

Your comment was that the data was Open but that an SDF was not available and we should scrape the data. I was looking at this possibility today. I was pleasantly surprised to see a number of the journals listed included ACS journals and Elsevier journals (http://wwmm.ch.cam.ac.uk/crystaleye/summary/index.html). There has been a lot of traffic of late about their Open Access policies but now I see that they are supporting your Open Data efforts. This is excellent. I would like confirmation that they are aware of the Open Data posted from their journals before we scrape them. Are they aware? I want to make sure I am respecting all parties. Thanks

  1. pm286 Says:
    October 26th, 2007 at 7:54 am

(1) All data come from Free sources – i.e. visible without a subscription. Some journals (Acta Crystallographica and RSC for example) do not copyright the data. Others like ACS add copyright notices. It is our contention, and Elsevier has agreed for its own material, that facts are not copyrightable. We have therefore extracted and transformed facts and mounted these. Where the original material (CIF) does not carry copyright we mount it on our pages – where it does we do not, but we have the transformed data. In those cases it would be possible to recreate the original CIF data in semantic form ,but not the exact typographical layout which contains meaningless whitespace.

I am not aware that ACS or Elsevier have ever made statements of any kind about our Open Data efforts.

You may scrape anything, must you must honour the source and the metadata and you should add the Open Data sticker. If you scrape the link (simplest) you may simpy point to our site. If you scrape more data you should ensure that the integrity of the data is maintined and that if it is re-used the re-used data should still clearly show our metadata

Our intention is to scrape the InChIs, the title of the article, the journal name, volume and page details and the DOI number. We will de-duplicate the structures onto the database or create new structure records as appropriate. My concern is whether or not the ACS will allow us to scrape their Open Data so I have issued the direct question to them below. I am hoping for an affirmative response and then I will move on to confirm with the other publishers.

Colleagues,
I am the host of ChemSpider, an online resource for chemists. www.chemspider.com. For an overview of what we are doing please visit: http://www.chemspider.com/docs/ChemSpider_Overview_SLides_August_2007.pdf

I am presently considering utilizing the data from the CrystalEye online database as I have outlined here: http://www.chemspider.com/blog/?p=191

The CrystalEye database is run from the University of Cambridge by Professor Murray-Rust. I have looked at the sources of data populated on the database and see that there are a number of ACS journals represented there, including JACS. Please see http://wwmm.ch.cam.ac.uk/crystaleye/summary/index.html

I am seeking confirmation that if we scrape the data from the CrystalEye database and populate onto ChemSpider that we will not be breaking any copyrights. I have asked the question here to Peter: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=737#comment-62799 and he has answered. I am now seeking your confirmation that it is appropriate for me to access the data since this is marked as Open Data at Peter’s site. I welcome your comments. Thank you

The list of all Publishers is given below. If we can deposit the Open Data structures from CrystalEye into ChemSpider and link up to the articles using DOI lookup through Crossref then we will be continuing our project of making articles structure searchable. Exciting times.

 

 

About tony

Antony (Tony) J. Williams received his BSc in 1985 from the University of Liverpool (UK) and PhD in 1988 from the University of London (UK). His PhD research interests were in studying the effects of high pressure on molecular motions within lubricant related systems using Nuclear Magnetic Resonance. He moved to Ottawa, Canada to work for the National Research Council performing fundamental research on the electron paramagnetic resonance of radicals trapped in single crystals. Following his postdoctoral position he became the NMR Facility Manager for Ottawa University. Tony joined the Eastman Kodak Company in Rochester, New York as their NMR Technology Leader. He led the laboratory to develop quality control across multiple spectroscopy labs and helped establish walk-up laboratories providing NMR, LC-MS and other forms of spectroscopy to hundreds of chemists across multiple sites. This included the delivery of spectroscopic data to the desktop, automated processing and his initial interests in computer-assisted structure elucidation (CASE) systems. He also worked with a team to develop the worlds’ first web-based LIMS system, WIMS, capable of allowing chemical structure searching and spectral display. With his developing cheminformatic skills and passion for data management he left corporate America to join a small start-up company working out of Toronto, Canada. He joined ACD/Labs as their NMR Product Manager and various roles, including Chief Science Officer, during his 10 years with the company. His responsibilities included managing over 50 products at one time prior to developing a product management team, managing sales, marketing, technical support and technical services. ACD/Labs was one of Canada’s Fast 50 Tech Companies, and Forbes Fast 500 companies in 2001. His primary passions during his tenure with ACD/Labs was the continued adoption of web-based technologies and developing automated structure verification and elucidation platforms. While at ACD/Labs he suggested the possibility of developing a public resource for chemists attempting to integrate internet available chemical data. He finally pursued this vision with some close friends as a hobby project in the evenings and the result was the ChemSpider database (www.chemspider.com). Even while running out of a basement on hand built servers the website developed a large community following that eventually culminated in the acquisition of the website by the Royal Society of Chemistry (RSC) based in Cambridge, United Kingdom. Tony joined the organization, together with some of the other ChemSpider team, and became their Vice President of Strategic Development. At RSC he continued to develop cheminformatics tools, specifically ChemSpider, and was the technical lead for the chemistry aspects of the Open PHACTS project (http://www.openphacts.org), a project focused on the delivery of open data, open source and open systems to support the pharmaceutical sciences. He was also the technical lead for the UK National Chemical Database Service (http://cds.rsc.org/) and the RSC lead for the PharmaSea project (http://www.pharma-sea.eu/) attempting to identify novel natural products from the ocean. He left RSC in 2015 to become a Computational Chemist in the National Center of Computational Toxicology at the Environmental Protection Agency where he is bringing his skills to bear working with a team on the delivery of a new software architecture for the management and delivery of data, algorithms and visualization tools. The “Chemistry Dashboard” was released on April 1st, no fooling, at https://comptox.epa.gov, and provides access to over 700,000 chemicals, experimental and predicted properties and a developing link network to support the environmental sciences. Tony remains passionate about computer-assisted structure elucidation and verification approaches and continues to publish in this area. He is also passionate about teaching scientists to benefit from the developing array of social networking tools for scientists and is known as the ChemConnector on the networks. Over the years he has had adjunct roles at a number of institutions and presently enjoys working with scientists at both UNC Chapel Hill and NC State University. He is widely published with over 200 papers and book chapters and was the recipient of the Jim Gray Award for eScience in 2012. In 2016 he was awarded the North Carolina ACS Distinguished Speaker Award.
Leave a comment

Posted by on October 26, 2007 in Community Building

 

0 Responses to Intention to Scrape CrystalEye Content and Staying in Relationship with Publishers

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
Stop SOPA