Dealing with the Complex Challenge of Managing Diverse Chemistry Data Online to Enable Chemistry Across the World #ACSsanfran
This is my third presentation today at the ACS meeting in San Francisco on 11th August 2014
Dealing with the Complex Challenge of Managing Diverse Chemistry Data Online to Enable Chemistry Across the World
The Royal Society of Chemistry has provided access to data associated with millions of chemical compounds via our ChemSpider database for over 5 years. During this period the richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process of implementing a new architecture to build a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on how our efforts to manage chemistry related data has impacted chemists and projects across the world and will review specifically our contributions to projects involving natural products for collaborators in Brazil and China, for the Open Source Drug Discovery project in India, and our collaborations with scientists in Russia.
My first talk of three on August 11th 2014 at the ACS San Francisco meeting
Teaching analytical spectroscopy using online spectroscopic data
The teaching of spectroscopy can be a complex and challenging task. The Royal Society of Chemistry has been developing online resources for a number of years that provide access to analytical data as well as interactive quizzes and challenge sets. The RSC data repository houses over 250,000 spectra at this time including mass spectrometry, NMR and IR data and these are utilized to provide online games to test students capabilities, to underpin the SpectraSchool training website and to produce source data for students and teachers alike to use in their teaching and self-training efforts. This presentation will provide an overview of RSC resources that can be used to teach spectroscopy using our online data and tools.
My second talk of three on August 11th 2014 at the ACS Meeting in San Francisco.
Encouraging students to start publishing early in their career
Many students spend enormous amounts of their time engaged with their computers, accepting of course that mobile devices are simply computers of a different form factor. Engaged with the social networks, utilizing computer platforms to source and share content of various forms, their contributions of “data” into what is the cloud, and in many cases a void, is enormous. What community and career benefit might result from those students spending some of their time contributing chemistry related data to the world? What challenges lie in the way of their participation and how might participating have a positive, or negative impact on their future career. The Royal Society of Chemistry hosts a number of chemistry data platforms to which students can actively contribute and for which their participation can be measured. Moreover the RSC’s micropublishing platform allows chemists to learn how to write up their scientific work, obtain review from their peers and chemistry professors in a non-threatening environment and produce an online published work in less than day that is both citable and available as a shared resource for the community. This presentation will demonstrate how to participate and encourage engagement from students early in their education. There are no longer any technology barriers to the sharing of the majority of chemistry related data.
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry #ACSsanfran
This is my presentation at the ACS San Francisco Fall Meeting on August 10th 2014
How the InChI identifier is used to underpin our online chemistry databases at the Royal Society of Chemistry
The Royal Society of Chemistry hosts a growing collection of online chemistry content. For much of our work the InChI identifier is an important component underpinning our projects. This enables the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a platform encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.
This is the first presentation I gave at the ACS meeting in San Francisco on Sunday morning (August 8th) in the CINF Natural Products session.
Applying Royal Society of Chemistry cheminformatics skills to support the PharmaSea project
The collaborative project PharmaSea brings European researchers to some of the deepest, coldest and hottest places on the planet. Scientists from the UK, Belgium, Norway, Spain, Ireland, Germany, Italy, Switzerland and Denmark are working together to collect and screen samples of mud and sediment from huge, previously untapped, oceanic trenches. The large-scale, four-year project is backed by almost 10 million euros of funding and brings together 24 partners from 13 countries from industry, academia and non-profit organisations. The PharmaSea project focuses on biodiscovery research and the development and commercialisation of new bioactive compounds from marine organisms, including deep-sea sponges and bacteria, to evaluate their potential as novel drug leads or ingredients for nutrition or cosmetic applications. The Royal Society of Chemistry is responsible for developing a number of capabilities to support the Pharmasea project including a chemical registration system for new compounds, dereplication technologies to assist in the identification of new compounds and search techniques for mass spectrometrists within the project. This presentation will provide an overview of the project and our progress to contributing chemical information technologies to support the effort.
This is a presentation I gave at the National Institute of Standards and Technology on July 30th 2014
Experiences in Hosting Big Chemistry Data Collections for the Community
Access to scientific information has changed dramatically as a result of the web and its underpinning technologies. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. RSC hosts a number of chemistry data resources for the community including ChemSpider, one of the community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day. The platform offers the ability for crowdsourcing enabling the community to deposit and curate data. This presentation will provide an overview of the expanding reach of this cheminformatics platform and the nature of the solutions that it helps to enable including structure validation and text mining and semantic markup. ChemSpider is limited in scope as a chemical compound database and we are presently architecting the RSC Data Repository, a platform that will enable us to extend our reach to include chemical reactions, analytical data, and diverse data depositions from chemists across various domains. We will also discuss the possibilities it offers in terms of supporting data modeling and sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community.
Data Mining Dissertations and Adventures and Experiences in the World of Chemistry
This presentation was given at the CLIR/DLF Postdoctoral Fellowship Summer Seminar at Bryn Mawr college in Pennsylvania on July 29th 2014. The intention was to communicate what we are doing in the fields of text and data mining in the domain of chemistry and specifically around mining the RSC archive publication and chemistry dissertations and theses. How would these experiences map over to the humanities?
I presented at the Food and Drug Administration today regarding some of our efforts to develop a research data repository for the community. The abstract and presentation from Slideshare is below.
Current Initiatives in Developing Research Data Repositories at the Royal Society of Chemistry
Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. RSC hosts a number of chemistry data resources for the community including ChemSpider, one of the community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this cheminformatics platform and the nature of the solutions that it helps to enable including structure validation and text mining and semantic markup. ChemSpider is limited in scope as a chemical compound database and we are presently architecting the RSC Data Repository, a platform that will enable us to extend our reach to include chemical reactions, analytical data, and diverse data depositions from chemists across various domains. We will also discuss the possibilities it offers in terms of supporting data modeling and sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community.
We have been working with Vincent Scalfani from the University of Alabama towards supporting a community of 3D printing crystal structure enthusiasts. There is a listserv, [3DP-XTAL] hosted by the university of Alabama and if you would like to be added to the listserv, simply email Vincent at vfscalfaniATuaDOTedu. They are also in the process of creating a 3D printing crystal structure wiki/blog for the community.
With Vincent as the driver we are creating a public on-line repository for 3D printable structure files (.stl and .wrl). He used Jmol to prepare ~30,000 molecules and solids in .wrl and .stl format and we will be hosting them on part of our data repository. We are very excited about this project and there will be more information at the upcoming 248th American Chemical Society Meeting in San Francisco, CA. See CINF Abstract # 125.
The flier that will be distributed at the IUCr meeting in Montreal in August is available on Slideshare here:
I give a lot of presentations. A lot. Maybe too many. At the impending ACS meeting in San Francisco I am giving nine presentations. When I give a presentation I like to share it afterwards. I need the distribution method to be quick, easy to use and hopefully let users of the platform find it if they were interested in it. I have used various platforms to disseminate my talks. There are really no usability issues with any of them….the various groups have done a good job building their platforms. I am a user of both Slideshare and Figshare and my accounts are here: Slideshare and Figshare. This week I received my weekly stats email and the numbers are below…>3000 views in one week and a total of 400,000 views total of my talks, preprints etc.
Compare this with my Figshare stats of >6600 views ever.
The majority of talks I upload to Slideshare have about 3000 views in 2 months as shown below…some have over 25000 now.
If I compare this with Figshare the most views I have is around 500 but that was over 18 months.
Clearly my presentations on Slideshare get way higher exposure. However, the usual question of quality vs quantity comes to bear. Likely the audience on Figshare, of scientists primarily, may be more my audience rather on Slideshare. What I should do, but it is time-consuming (but only a few additional minutes per presentation) is put the presentation to Slideshare, to Figshare, to my Academia.edu account, to my ResearchGate account, to Vimeo, to YouTube etc. But I only have so much time and right now my easiest deposition route is Slideshare. In terms of my actual prioritization of places to deposit, based on the number of views and downloads the order is