Archive for category RSC Publishing
Today I gave a presentation at the 2014 Allen Press Emerging Trends in Scholarly Publishing™ Seminar here in Washington DC. Over coffee I had very positive feedback about what we are doing at RSC and various comments about “real science exposed by a publisher”. The abstract and Slideshare presentation are below.
The Application of Text and Data Mining to Enhance the Royal Society of Chemistry Publication Archive
The Royal Society of Chemistry (RSC) is one of the world’s most prominent scientific societies and STM publishers. Our contributions to the scientific community include the delivery of a myriad of resources to support the chemistry community to access chemistry-related data, information and knowledge. This includes ChemSpider, a compound centric platform linking together over 30 million chemical compounds with internet-based resources. Using this compound database and its associated chemical identifiers as a basis the RSC is utilizing text and data mining approaches to data enable our published archive of scientific publications. This presentation will provide an overview of our technical approaches to text and data enable our archive of scientific articles, how we are developing an integrated database of chemical compounds, reactions, physical and analytical data and how it will be used to facilitate scientific discovery.
This is a presentation I gave at the ACS Dallas meeting on March 19th 2014
Data enhancing the Royal Society of Chemistry publication archive
The Royal Society of Chemistry has an archive of hundreds of thousands of published articles containing various types of chemistry related data – compounds, reactions, property data, spectral data etc. RSC has a vision of extracting as much of these data as possible and providing access via ChemSpider and its related projects. To this end we have applied a combination of text-mining extraction, image conversion and chemical validation and standardization approaches. The outcome of this project will result in new chemistry related data being added to our chemical and reaction databases and in the ability to more tightly couple web-based versions of the articles with these extracted data. The ability to search across the archive will be enhanced as a result. This presentation will report on our progress in this data extraction project and discuss how we will ultimately use similar approaches in our publishing pipeline to enhance article markup for new publications.
We at RSC are fully committed to a mobile vision in terms of access to articles, data, our databases, services and…well…let’s see what the future brings! I’ve been fascinated with mobile chemistry for a couple of years now and co-authored a number of relevant articles in this area…
A.J. Williams and H. Pence, Smart Phones, a Powerful Tool in the Chemistry Classroom, J. Chem. Educ. 2011, 88 (6), pp 683–686. Link
Mobilizing Chemistry in the World of Drug Discovery, A.J. Williams, S. Ekins, R. Apodaca, A.Clark and J. Jack, Drug Discovery Today, 16:928-939
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration, S. Ekins, A.M. Clark, A.J. Williams, Molecular Informatics 31 (8), 585-597, 2012 Link
Redefining Cheminformatics with Intuitive Collaborative Mobile Apps, A.M. Clark, S. Ekins, A.J. Williams, Molecular Informatics 31 (8), 569-584, 2012 Link
Incorporating Green Chemistry Concepts into Mobile Chemistry Applications and Their Potential Uses, S. Ekins, A.M.Clark and A.J. Williams, ACS Sustainable Chem. Eng., 2013, 1 (1), pp 8–13, http://pubs.acs.org/doi/abs/10.1021/sc3000509
Cheminformatics workflows using mobile apps, A. Clark, A.J. Williams and S. Ekins, Chem-Bio Informatic Journal, Vol. 13, pp.1-18 (2013) https://www.jstage.jst.go.jp/article/cbij/13/0/13_1/_pdf
In parallel we have been VERY active in supporting the delivery of Mobile Apps such as ChemSpider mobile for BOTH iOS and Android written by Alex Clark. In parallel we have been working on a couple of new apps and now we release, for Android only at present, our new NPU Alerts application. NPU stands for Natural Product Updates, one of the RSC graphical Databases as shown here: LINK.
What Dmitry Ivanov, one of our team, has produced is an Android App that displays the latest batch of structures in an “issue” of the database, produced monthly. It displays up to 200 compound structures and the links out to both ChemSpider and the relevant record on the graphical abstracts database. It is MUCH easier for a scientist to recognize structure class by looking at a structure representation compared with a chemical name like hexamethylchickenwire. A user of the app can quickly browse the chemical structures and click on the relevant compound for more information.
This is the first example of us displaying “structure flows” like this from a graphical abstract database. The first of many. it is not difficult to envisage extending this to supporting structure flows for each issue of a journal…right!?
Please go and try out the app and give us your feedback….it can be downloaded here: LINK
Chemistry is complex. Anybody who has been involved with the creation of electronic datafiles containing thousands of chemical compounds and associated data (chemical names, properties etc) will tell you that errors creep in. ChemSpider has >28 million unique chemical entities and these have been sourced from many different places/groups/individuals. Some of these have been deprecated as we have determined, both manually and algorithmically, that the data are in error. Over the years we have learned a lot about data quality and ways in which algorithms can be applied to data prior to deposition on ChemSpider.
Some obvious structure-based errors that can be checked for would include: hypervalency (e.g. pentavalent carbons), charge imbalance (a compound has no neutralizing counterion for example), absence of stereochemistry (e.g. a compound with 12 possible stereocenters only has one assigned). There are many other such errors that can be detected algorithmically. It’s the old adage of why apply a human to what a computer can fix. With this in mind we have been working on a system called the ChemSpider Validation and Standardization Platform (CVSP for short). This system will serve multiple purposes. It will be one of the foundation blocks for checking structure-based data for our publications (i.e. catch bad chemistry before it is published!), it will be used for validating chemistry for our databases (Natural Product Updates, Methods in Organic Synthesis and Catalysts and Catalyzed Reactions), it will be used to check and validate depositions going into ChemSpider, it will serve data related to the Open PHACTS project and it will serve the community by providing an online website where you can upload your own SDF files (and other file formats in future) to validate the structures.
I won’t go into detail here about all of the functionality and capability of the system as we will discuss this in further detail on this blog. However, we will be unveiling the system in its present form at the ACS meeting in Philadelphia. Come along and meet some of the team involved in building CVSP and give us your feedback!
I have written a lot of book chapters over the years, probably about 20, and have another 4 in press . I also have 3 more waiting on me to write by end of year (agh…). I have co-authored three books over the past few years (1,2,3) but other than the first book, self-published with ACD/Labs, I was not involved with setting the price. That’s probably good as it would likely be randomly changed, as would the list of authors and the number of pages!
There has been a question on this blog about whether I think the price for the most recent book is appropriate and I will discuss that when I have more time. I would say that based on the likely number of copies that will sell for this very specialized area, the size of the book and the amount of work it took us to put together (almost 2 years of work describing about 15 years of work), that this is probably a fair price…about $220 (but with price variation to be discussed below). If you consider that our single articles can be $30-35 for ONE PDF for 48 hours of access summarizing only one point in time in our research then I do think that the price is fine. Having previously “self-published” and seen how many books can be sold in that way I’d say that price is definitely appropriate considering the quality of support we have received from the publisher, RSC, and the associated costs of set-up for printing that must be taken on. Maybe self-publishing would be better nowadays in terms of increased volume of sales, as my last experience was 10 years ago, but based on comments from people using Lulu.com (for chemistry books), sales volume is very low and for worldwide marketing to libraries a professional publisher IS necessary.
Back to the point of this blog post. Who really sets prices for a book, taking just the chemistry book I am involved with as an example? Amazon want about $220, at present, for a copy of our book. That includes a “random 7% discount” that comes from where? However, then things get interesting….
BetweenReads.com.au have the price listed in Australian dollars, add the book editor as an author, change the order of the authors and add another random discount.
PowellsBooks loses two of the authors and leaves only Mikhail Elyashberg as the sole author but keeps the price as the original Amazon price, no discount.
Barnes and Noble give a 19% discount before the book is even released, not an uncommon situation of course.
In most cases the number of pages is underestimated to be 368 pages but if you consult the RSC page you will see that it is almost 500 pages and the LIST price is 146.99 UK Pounds.
Who knows where these various online book sellers get their information and how their prices get set, but clearly there are discrepencies. While this book isn’t a mainstream novel moving the basic info out to the sites should be easy. One has to assume that the various discounts are based on either the scale of the sales operation or, it seems, more random factors. All very interesting…and no resolution from me!
I had the pleasure of co-presenting with my friend Jean-Claude Bradley today at the “3rd Annual Drug Discovery Partnership: Filling the Pipeline“. Jean-Claude gave a great talk, available on Slideshare here, and discussed the issue of data quality, how improve data gives improved models, the cross-validation of data and proliferation of errors. My talk is on Slideshare here and embedded below. In many ways I discussed similar issues, though not focused on melting point data but rather on structures, structure-identifier relationships, the cross-linking of multiple resources on the internet and how online resources can support Open Drug Discovery Systems. In this presentation I discussed some of the work we are doing on Open PHACTS.