I have been invited to write an article regarding Open Access Chemistry Databases and am in the process of gathering information. During one of my google searches I happened across a statement I was aware of but had forgotten until recently. It relates to the ability to use CAS numbers on a website. Specifically, from the CAS Information Use Policies of 2005 it says, quote:
“A User or Organization may include, without a license and without paying a fee, up to 10,000 CAS Registry Numbers or CASRNs in a catalog, website, or other product for which there is no charge. The following attribution should be referenced or appear with the use of each CASRN: CAS Registry NumberÂ® is a Registered Trademark of the American Chemical Society. CAS recommends the verification of the CASRNs through CAS Client ServicesSM.”
I interpret this as meaning that above 10,000 CAS numbers permission must be granted to the organization gathering togethering a data collection. Based on my experience there are a LOT of situations where collections of more than 10,000 CAS numbers exist. We are presently deduplicating and indexing another million structures on the ChemSpider index. We regularly receive SDF files (are these electronic “catalogs”?) containing structures and CAS numbers…and when these contain over 10,000 CAS numbers are they inadvertently going against CAS policy? Are all of those online databases with a large number of structures doing so with permission (for example ChemIDPlus, ZINC DB, eMolecules and, of course, PubChem.
I can only imagine if these large collections/websites/databases do not have permission to expose over 10,000 CAS numbers. What a public relations nightmare that could open up! Since we deposited the PubChem dataset to ChemSpider that naturally includes any associated registry numbers. Since eMolecules has deposited portions (not all) of the PubChem dataset they also have deposited the registry numbers.
I may be lighting a fire here, and might get some interesting calls as a result, but I am publicly asking the question…if you are managing a website or public data collection of over 10,000 CAS numbers (read that as any site exposing PubChem data) have you asked permission to expose the data? And … did you get permission? CAS numbers are everywhere…they are “phone numbers” for chemistry. On cans and boxes in our kitchen and garage. On webpages all over the place. This is a very interesting situation for “large chemistry databases”…