Archive for category RSC Publishing
Today is my last day of employment for the Royal Society of Chemistry. It will be almost six years since I joined RSC when ChemSpider was acquired. While ChemSpider was initially a “hobby project” and attempt to create a disruption in terms of access to chemistry data, crowdsourced contribution and data validation, it has gone from strength to strength and now serves ca. 40,000 unique users a day from around the world. It won three awards in the first few months that we joined RSC and was catalytic in RSC winning three grants to allow us to participate in the Open PHACTS project, the PharmaSea project and become the host of the UK National Chemical Database Service. Based on the feedback I have received over the years ChemSpider is much-loved and appreciated as a contribution to the scientific community and is recognized as one of the key players in the free chemistry resources arena. I am proud to have been associated with it.
We also got to set up the ChemSpider SyntheticPages micropublishing site and tried to get the community sharing syntheses that would likely not make it into mainstream papers but were still of value to science.
During my six years at RSC I have been involved with many discussions regarding the following areas of work, study and research and how they would benefit publishing, the society and, of course, the chemistry community at large. The list includes, in particularly random order:
- Chemistry databases – both commercial and free- and how to best mesh, commercialize and license data
- Data quality in publications and databases and development of tools for data validation
- Open Data, Open Access and Open Notebook Science
- Text-mining of the RSC archive to extract & mark up compounds, reactions, property data and analytical data.
- The potential of semantic web applications to scientific publishing
- Encouraging the use of Open Identifiers – especially ORCID and InChI
- The future of Micropublishing in the chemical sciences
- Analytical data and building an open spectral database for the community
- Social networking approaches to build online profiles – especially for young scientists
There are many, many more things of course but these are the big ones and, for me, bring clarity to what my interests are – chemistry data and making it available to the appropriate communities. It is with this in mind that I am excited to join the Environmental Protection Agency next week in the National Center of Computational Toxicology.
With every move forward into a new job we leave behind our old one. And I leave RSC with some sadness that I am leaving and excitement for the new opportunities. I have had the chance to work with so many good people at RSC, to engage with collaborators such as ACD/Labs, Mestre, NextMove, EBI, ChemAxon, Accelrys (as they were then), iChemLabs, Dotmatics and on and on. Apologies if you are not named but the list is very long. Thanks to everyone for your support, encouragement and opportunities to engage. It has been a blast.
And for everyone at RSC who catered to my strange diet of potatoes only…so long, and thanks for all the spuds.
This is a presentation that I delivered at the ACS Division of Chemical Information meeting regarding “Reproducibility, Reporting, Sharing & Plagiarism” at ACS Denver on 23rd March 2015.
I took the opportunity to remove my hat that has me be the VP of Strategic Development at RSC, and a member of the cheminformatics group that built ChemSpider and works on other RSC projects related to it. Instead I presented on how a LACK OF MANDATES from publishers on me in terms of submission of data accompanying articles I am involved with writing is actually weakening my scientific record as data is not getting shared in the most useful forms possible to the benefit of the community. I think there would be benefits for publishers to start pushing me for MORE data, in fairly general standards, and allowing me (and others) to download the data in the form of molecules (and collections), spectral data, CSV files etc.
Today I received notification that an app to accompany a forthcoming RSC book ” The Handbook of Medicinal Chemistry: Principles and Practice” went live on iTunes.
“The Medicinal Chemistry Toolkit app is a suite of resources to support the day to day work of a medicinal chemist. Based on the experiences of medicinal chemistry experts, we developed otherwise difficult-to-access tools in a portable format for use in meetings, on the move and in the lab. The app is optimised for iPad and contains calculator functions designed to ease the process of calculating values of: Cheng-Prusoff; Dose to man; Gibbs free energy to binding constant; Maximum absorbable dose calculator; Potency shift due to plasma protein binding.
If you have an iPad then you can download the app from here.
The book itself will be published in November 2014 and will provide a comprehensive, everyday resource for a practicing medicinal chemist throughout the drug development process
The app will be updated on an ongoing basis with new algorithms and calculators so make sure you check back or update when it tells you.
Data Mining Dissertations and Adventures and Experiences in the World of Chemistry
This presentation was given at the CLIR/DLF Postdoctoral Fellowship Summer Seminar at Bryn Mawr college in Pennsylvania on July 29th 2014. The intention was to communicate what we are doing in the fields of text and data mining in the domain of chemistry and specifically around mining the RSC archive publication and chemistry dissertations and theses. How would these experiences map over to the humanities?
Today I gave a presentation at the 2014 Allen Press Emerging Trends in Scholarly Publishing™ Seminar here in Washington DC. Over coffee I had very positive feedback about what we are doing at RSC and various comments about “real science exposed by a publisher”. The abstract and Slideshare presentation are below.
The Application of Text and Data Mining to Enhance the Royal Society of Chemistry Publication Archive
The Royal Society of Chemistry (RSC) is one of the world’s most prominent scientific societies and STM publishers. Our contributions to the scientific community include the delivery of a myriad of resources to support the chemistry community to access chemistry-related data, information and knowledge. This includes ChemSpider, a compound centric platform linking together over 30 million chemical compounds with internet-based resources. Using this compound database and its associated chemical identifiers as a basis the RSC is utilizing text and data mining approaches to data enable our published archive of scientific publications. This presentation will provide an overview of our technical approaches to text and data enable our archive of scientific articles, how we are developing an integrated database of chemical compounds, reactions, physical and analytical data and how it will be used to facilitate scientific discovery.
This is a presentation I gave at the ACS Dallas meeting on March 19th 2014
Data enhancing the Royal Society of Chemistry publication archive
The Royal Society of Chemistry has an archive of hundreds of thousands of published articles containing various types of chemistry related data – compounds, reactions, property data, spectral data etc. RSC has a vision of extracting as much of these data as possible and providing access via ChemSpider and its related projects. To this end we have applied a combination of text-mining extraction, image conversion and chemical validation and standardization approaches. The outcome of this project will result in new chemistry related data being added to our chemical and reaction databases and in the ability to more tightly couple web-based versions of the articles with these extracted data. The ability to search across the archive will be enhanced as a result. This presentation will report on our progress in this data extraction project and discuss how we will ultimately use similar approaches in our publishing pipeline to enhance article markup for new publications.
We at RSC are fully committed to a mobile vision in terms of access to articles, data, our databases, services and…well…let’s see what the future brings! I’ve been fascinated with mobile chemistry for a couple of years now and co-authored a number of relevant articles in this area…
A.J. Williams and H. Pence, Smart Phones, a Powerful Tool in the Chemistry Classroom, J. Chem. Educ. 2011, 88 (6), pp 683–686. Link
Mobilizing Chemistry in the World of Drug Discovery, A.J. Williams, S. Ekins, R. Apodaca, A.Clark and J. Jack, Drug Discovery Today, 16:928-939
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration, S. Ekins, A.M. Clark, A.J. Williams, Molecular Informatics 31 (8), 585-597, 2012 Link
Redefining Cheminformatics with Intuitive Collaborative Mobile Apps, A.M. Clark, S. Ekins, A.J. Williams, Molecular Informatics 31 (8), 569-584, 2012 Link
Incorporating Green Chemistry Concepts into Mobile Chemistry Applications and Their Potential Uses, S. Ekins, A.M.Clark and A.J. Williams, ACS Sustainable Chem. Eng., 2013, 1 (1), pp 8–13, http://pubs.acs.org/doi/abs/10.1021/sc3000509
Cheminformatics workflows using mobile apps, A. Clark, A.J. Williams and S. Ekins, Chem-Bio Informatic Journal, Vol. 13, pp.1-18 (2013) https://www.jstage.jst.go.jp/article/cbij/13/0/13_1/_pdf
In parallel we have been VERY active in supporting the delivery of Mobile Apps such as ChemSpider mobile for BOTH iOS and Android written by Alex Clark. In parallel we have been working on a couple of new apps and now we release, for Android only at present, our new NPU Alerts application. NPU stands for Natural Product Updates, one of the RSC graphical Databases as shown here: LINK.
What Dmitry Ivanov, one of our team, has produced is an Android App that displays the latest batch of structures in an “issue” of the database, produced monthly. It displays up to 200 compound structures and the links out to both ChemSpider and the relevant record on the graphical abstracts database. It is MUCH easier for a scientist to recognize structure class by looking at a structure representation compared with a chemical name like hexamethylchickenwire. A user of the app can quickly browse the chemical structures and click on the relevant compound for more information.
This is the first example of us displaying “structure flows” like this from a graphical abstract database. The first of many. it is not difficult to envisage extending this to supporting structure flows for each issue of a journal…right!?
Please go and try out the app and give us your feedback….it can be downloaded here: LINK
Chemistry is complex. Anybody who has been involved with the creation of electronic datafiles containing thousands of chemical compounds and associated data (chemical names, properties etc) will tell you that errors creep in. ChemSpider has >28 million unique chemical entities and these have been sourced from many different places/groups/individuals. Some of these have been deprecated as we have determined, both manually and algorithmically, that the data are in error. Over the years we have learned a lot about data quality and ways in which algorithms can be applied to data prior to deposition on ChemSpider.
Some obvious structure-based errors that can be checked for would include: hypervalency (e.g. pentavalent carbons), charge imbalance (a compound has no neutralizing counterion for example), absence of stereochemistry (e.g. a compound with 12 possible stereocenters only has one assigned). There are many other such errors that can be detected algorithmically. It’s the old adage of why apply a human to what a computer can fix. With this in mind we have been working on a system called the ChemSpider Validation and Standardization Platform (CVSP for short). This system will serve multiple purposes. It will be one of the foundation blocks for checking structure-based data for our publications (i.e. catch bad chemistry before it is published!), it will be used for validating chemistry for our databases (Natural Product Updates, Methods in Organic Synthesis and Catalysts and Catalyzed Reactions), it will be used to check and validate depositions going into ChemSpider, it will serve data related to the Open PHACTS project and it will serve the community by providing an online website where you can upload your own SDF files (and other file formats in future) to validate the structures.
I won’t go into detail here about all of the functionality and capability of the system as we will discuss this in further detail on this blog. However, we will be unveiling the system in its present form at the ACS meeting in Philadelphia. Come along and meet some of the team involved in building CVSP and give us your feedback!
I have written a lot of book chapters over the years, probably about 20, and have another 4 in press . I also have 3 more waiting on me to write by end of year (agh…). I have co-authored three books over the past few years (1,2,3) but other than the first book, self-published with ACD/Labs, I was not involved with setting the price. That’s probably good as it would likely be randomly changed, as would the list of authors and the number of pages!
There has been a question on this blog about whether I think the price for the most recent book is appropriate and I will discuss that when I have more time. I would say that based on the likely number of copies that will sell for this very specialized area, the size of the book and the amount of work it took us to put together (almost 2 years of work describing about 15 years of work), that this is probably a fair price…about $220 (but with price variation to be discussed below). If you consider that our single articles can be $30-35 for ONE PDF for 48 hours of access summarizing only one point in time in our research then I do think that the price is fine. Having previously “self-published” and seen how many books can be sold in that way I’d say that price is definitely appropriate considering the quality of support we have received from the publisher, RSC, and the associated costs of set-up for printing that must be taken on. Maybe self-publishing would be better nowadays in terms of increased volume of sales, as my last experience was 10 years ago, but based on comments from people using Lulu.com (for chemistry books), sales volume is very low and for worldwide marketing to libraries a professional publisher IS necessary.
Back to the point of this blog post. Who really sets prices for a book, taking just the chemistry book I am involved with as an example? Amazon want about $220, at present, for a copy of our book. That includes a “random 7% discount” that comes from where? However, then things get interesting….
BetweenReads.com.au have the price listed in Australian dollars, add the book editor as an author, change the order of the authors and add another random discount.
PowellsBooks loses two of the authors and leaves only Mikhail Elyashberg as the sole author but keeps the price as the original Amazon price, no discount.
Barnes and Noble give a 19% discount before the book is even released, not an uncommon situation of course.
In most cases the number of pages is underestimated to be 368 pages but if you consult the RSC page you will see that it is almost 500 pages and the LIST price is 146.99 UK Pounds.
Who knows where these various online book sellers get their information and how their prices get set, but clearly there are discrepencies. While this book isn’t a mainstream novel moving the basic info out to the sites should be easy. One has to assume that the various discounts are based on either the scale of the sales operation or, it seems, more random factors. All very interesting…and no resolution from me!
I had the pleasure of co-presenting with my friend Jean-Claude Bradley today at the “3rd Annual Drug Discovery Partnership: Filling the Pipeline“. Jean-Claude gave a great talk, available on Slideshare here, and discussed the issue of data quality, how improve data gives improved models, the cross-validation of data and proliferation of errors. My talk is on Slideshare here and embedded below. In many ways I discussed similar issues, though not focused on melting point data but rather on structures, structure-identifier relationships, the cross-linking of multiple resources on the internet and how online resources can support Open Drug Discovery Systems. In this presentation I discussed some of the work we are doing on Open PHACTS.