Archive for category Open Science..all its forms
The Importance of the InChI Identifier as a Foundation Technology for eScience Platforms at the Royal Society of Chemistry
This is a presentation I gave today at Bio-IT 2014 here in Boston. I was in the company of a number of my favorite people to be o the agenda with… Steve Heller, Steve Boyer, Evan Bolton and Chris Southan.
The Importance of the InChI Identifier as a Foundation Technology for eScience Platforms at the Royal Society of Chemistry
The Royal Society of Chemistry hosts one of the largest online chemistry databases containing almost 30 million unique chemical structures. The database, ChemSpider, provides the underpinning for a series of eScience projects allowing for the integration of chemical compounds with our archive of scientific publications, the delivery of a reaction database containing millions of reactions as well as a chemical validation and standardization platform developed to help improve the quality of structural representations on the internet. The InChI has been a fundamental part of each of our projects and has been pivotal in our support of international projects such as the Open PHACTS semantic web project integrating chemistry and biology data and the PharmaSea project focused on identifying novel chemical components from the ocean with the intention of identifying new antibiotics. This presentation will provide an overview of the importance of InChI in the development of many of our eScience platforms and how we have used it specifically in the ChemSpider project to provide integration across hundreds of websites and chemistry databases across the web. We will discuss how we are now expanding our efforts to develop a Global Chemistry Network encompassing efforts in Open Source Drug Discovery and the support of data management for neglected diseases.
The potential benefits of making yourself visible online as a scientist
This is a presentation I gave at MIT to the Boston ACS Young Chemists regarding how they can take advantage of some of the online tools to spread the message about their activities, their interests, get engaged with collaborative science and participate now to gain benefits from the growing world of AltMetrics
This is my sixth presentation at the ACS Fall Meeting in Indianapolis:
Digitizing documents to provide a public spectroscopy database
RSC hosts a number of platforms providing free access to chemistry related data. The content includes chemical compounds and associated experimental and predicted data, chemical reactions and, increasingly, spectral data. The ChemSpider database primarily contains electronic spectral data generated at the instrument, converted into standard formats such as JCAMP, then uploaded for the community to access. As a publisher RSC holds a rich source of spectral data within our scientific publications and associated electronic supplementary information. We have undertaken a project to Digitally Enable the RSC Archive (DERA) and as part of this project are converting figures of spectral data into standard spectral data formats for storage in our ChemSpider database. This presentation will report on our progress in the project and some of the challenges we have faced to date.
This is ,y fourth talk at the ACS Indianapolis Conference:
Practical semantics in the pharmaceutical industry – the Open PHACTS project
The information revolution has transformed many business sectors over the last decade and the pharmaceutical industry is no exception. Developments in scientific and information technologies have unleashed an avalanche of content on research scientists who are struggling to access and filter this in an efficient manner. Furthermore, this domain has traditionally suffered from a lack of standards in how entities, processes and experimental results are described, leading to difficulties in determining whether results from two different sources can be reliably compared. The need to transform the way the life-science industry uses information has led to new thinking about how companies should work beyond their firewalls. In this talk we will provide an overview of the traditional approaches major pharmaceutical companies have taken to knowledge management and describe the business reasons why pre-competitive, cross-industry and public-private partnerships have gained much traction in recent years. We will consider the scientific challenges concerning the integration of biomedical knowledge, highlighting the complexities in representing everyday scientific objects in computerised form. This leads us to discuss how the semantic web might lead us to a long-overdue solution. The talk will be illustrated by focusing on the EU-Open PHACTS initiative (openphacts.org), established to provide a unique public-private infrastructure for pharmaceutical discovery. The aims of this work will be described and how technologies such as just-in-time identity resolution, nanopublication and interactive visualisations are helping to build a powerful software platform designed to appeal to directly to scientific users across the public and private sectors.
The future of scientific information & communication presented at the SUNY Potsdam Academic Festival
This is a LONG presentation….I talk about the “It’s All About Me” attitude that can positively feed science….we want to share OUR science, we want people to know about our opinions, our activities, our collaborators, we want to get funding, recognition and attribution. And why not…it can all be to the benefit of science.
This presentation was given at the SUNY Potsdam Academic Festival
The future of scientific information & communication
Our access to scientific information has changed in ways that were hardly imagined even by the early pioneers of the internet. The immense quantities of data and the array of tools available to search and analyze online content continues to expand while the pace of change does not appear to be slowing. While scientists now have access to the enormous capacities and capability of the internet the vast majority of scientific communication continues to be through peer-reviewed scientific journals. The measure of a scientist’s contribution is primarily represented by their publication profile and the citations to their published works and offers an incomplete view of their activities. However, we are at the beginning of a new revolution where the ability to communicate offers the opportunity to embrace new forms of publishing and where scientific participation and influence will be measured in new ways. This presentation will provide an overview of our new generation of “openness” in which open source, open standards, open access and open data are proliferating. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
This week/weekend I will attend the ScienceOnline2013 conference here in Raleigh, North Carolina. This is my favorite conference of the year, bar none. I feel privileged every time I attend to be surrounded by people who are challenging the status quo and are passionate about making science more available and consumable to their peers and the community. I have met some great people at this conference and every year I walk away tired yet invigorated. I walk away feeling that my own contributions to science, especially my work to enable access to chemistry data, is coherent with the efforts of many of the crowd attending this meeting. The meeting has a commitment to scientific truth, collaboration, communication and openness. YES!!!
While I am a chemist by training what I enjoy so much about the meeting is meeting NON-chemists and learning about their world, their interests, their adventures and challenges. By keeping my head in my own box at many other conferences, primarily chemistry of course, I limit what I can learn from the experiences outside of my domain. ScienceOnline frees me up from these boundaries by throwing me into a mix of wildly different engagement. It is, quite simply, a joy! And coming at the beginning of the year it is the first conference I attend…always good!
The conference is well organized, wall to wall entertainment in various forms (including science comedians!), is socially engaging (lots of opportunities for after hours play!) and is full of “my kind of people”. I am lucky to be so close and, this year, to be able to share space with one of my closest friends. Sean Ekins (@collabchem) and I will host a discussion on “Leading Chemists Into Openness“. Sean and I hung out at the conference last year and it had a good impact on him as he describes here.
If you are attending ScienceOnline2013, are interested in Open Science and the advantages, challenges and “unknowns” of how to get there, then please come and join the conversation. We are the hosts…you define where we go! The slides below are for you to review/consider/digest in advance of the session. See you there???!!!
There are a number of people in my domain that I have great appreciation for and that I enjoy working with. So, an opportunity to co-author on rules for licensing data with Sean Ekins and John Wilbanks was an opportunity too good to miss. There are a lot of opinions, rants and views on data licensing floating around the internet, discussed at conferences and over beverages. Meanwhile we have opinions too and have shared them through this perspective on PLoS Computational Biology through this paper: “Why Open Drug Discovery Needs Four Simple Rules for Licensing Data and Models”
When writing a publication how many of us conduct complete literature searches? For those of us who do not have access to Scifinder how are we finding our literature? Probably through Google Scholar? When I write a paper I admit that some of my searches may be less than complete but I do try and stay informed in regards to what is going on in my domain. VERY occasionally I get feedback from reviewers pointing me to references that they feel I either ignored or was unaware of. Many times they are co-authored by the reviewer themselves…and it is pretty easy to figure out who the reviewers are
Today I received an email in my inbox about the latest article in the Journal of Cheminformatics. It is OMG: Open Molecule Generator. The article is here. The abstract opens with “Computer Assisted Structure Elucidation has been used for decades to discover the chemical structure of unknown compounds. In this work we introduce the first open source structure generator, Open Molecule Generator (OMG), which for a given elemental composition produces all non-isomorphic chemical structures that match that elemental composition.”
Having been involved with Computer-Assisted Structure Elucidation for many years, having co-authored a book about it (here) and probably the definitive review article from the past 5 years (here) I would have assumed that our work would have been referenced. I was surprised to see that our work was not referenced while other CASE systems were. Articles we’ve issued over the past few years are below. I’ve gathered them here to point the authors to in case they want to reference any of them and missed them in the literatire search.
I am taking advantage of the fact that I can leave comments on the provisional manuscript here (what a great capability!!!) and will let them know about this list. it would be good to compare the performance of the OMG with the structure generator under ACD/Structure Elucidator sometime….
1) M.E. Elyashberg, K.A. Blinov and A.J. Williams, Computer-aided Molecular Structure Elucidation on the Basis of 1D and 2D NMR Spectra, Applied Magnetic Resonance, (May 2000)
2) K.A. Blinov, M.E. Elyashberg, S.G. Molodtsov, A.J. Williams and E.R. Martirosian, An Expert System for Automated Structure Elucidation Utilizing 1H-1H, 13C-1H, and 15N-1H 2D NMR correlations, Fresenius J. Anal. Chem., 369, 709 (2001)
3) G.E. Martin, C.E. Hadden, D.J. Russell, B.D. Kaluzny, J.E. Guido, W.K. Duholke, B.A. Stiemsma, T.J. Thamann, R.C. Crouch, K.A. Blinov, M.E. Elyashberg, E.R. Martirosian, S.G. Molodtsov, A.J. Williams, P.L. Schiff, Jr., Identification of Degradants of a Complex Alkaloid Using NMR Cryoprobe Technology and ACD/Structure Elucidator, J. Heterocyclic Chem. 39, 1241 (2002)
4) M.E. Elyashberg, K.A. Blinov, A.J. Williams, E.R. Martirosian, S.G. Molodtsov, Application of a New Expert System for the Structure Elucidation of Natural Products from the 1D and 2D NMR Data, J. Nat. Prod., 65, 693 (2002)
5) G . E. Martin, C .E. Hadden, D. J. Russell, B. D. Kaluzny, J. E. Guido, W. K. Duholke, B. A. Stiemsma, T. J. Thamann, R. C. Crouch, K. A. Blinov, M. E. Elyashberg, E. R. Martirosian, S. G. Molodotsov, A. J. Williams, and P. L. Schiff, Jr., Identification of Degradants of a Complex Alkaloid Using NMR Cryoprobe Technology and ACD/Structure Elucidator, J. Heterocyclic Chem., 39 1241-1250 (2002).
6) K. A. Blinov, D. Carlson, M. E. Elyashberg, G. E. Martin, E. R. Martirosian, S. Molodtsov, and A. J. Williams, Computer-Assisted Structure Elucidation of Natural Products with Limited 2D NMR Data: Applications of the StrucEluc System, Magn. Reson. Chem., 41, 359-372 (2003).
7) G. E. Martin, D. J. Russell, K. A. Blinov, M. E. Elyashberg and A. J. Williams, Applications and Advances in Cryogenic NMR Probes & Computer-Assisted Structure Elucidation. Ann. Magn. Reson., 2, 1-31 (2003)
8) K. Blinov, M. Elyashberg, E. R. Martirosian, S. G. Molodtsov, A. J. Williams, M. H. M. Sharaf, P. L. Schiff, Jr., R. C. Crouch, G. E. Martin, C. E. Hadden, and J. E. Guido, “Quindolinocryptotackieine: The Elucidation of a Novel Indoloquinoline Alkaloid Structure through the Use of Computer-Assisted Structure Elucidation and 2D-NMR,” Magn. Reson. Chem., 41, 577-584 (2003).
9) M. E. Elyashberg, K. A. Blinov, E. R. Martirosian, S. G. Molodtsov, A. J. Williams, and G. E. Martin, Automated Structure Elucidation – The Benefits of a Symbiotic Relationship between the Spectroscopist and the Expert System, J. Heterocyclic Chem., 40, 1017-1029 (2003).
10) M. E. Elyashberg, K. A. Blinov, A. J. Williams, S. G. Molodtsov, G. E. Martin, and E. R. Martirosian, Structure Elucidator: A Versatile Expert System for Molecular Structure Elucidation from 1D and 2D NMR Data and Molecular Fragments, J. Chem. Inf. Comput. Sci. 44, 771-792 (2004).
11) S. G. Molodtsov, M. E. Elyashberg, K. A. Blinov, A. J. Williams, E. E. Martirosian, G. E. Martin, and B. Lefebvre. Structure Elucidation from 2D NMR Spectra Using the StrucEluc Expert System: Detection and Removal of Contradictions in the Data. J. Chem. Inf. Comp. Sci., 44, 1737-1751 (2004)
12) G. J. Sharman, I. C. Jones, M. P. Parnell, M. C. Willis, M. F. Mahon, D. V. Carlson, A. J. Williams, M. E. Elyashberg, K. A. Blinov, S. G. Molodtsov. Automated structure elucidation of two products in a reaction of an a,b-unsaturated pyruvate. Magn. Reson. Chem. 42, 567 (2004)
13) Y. D. Smurnyy, M. E. Elyashberg, K. A. Blinov, B. A. Lefebvre, G. E. Martin, and A. J. Williams, Computer-Aided Determination of Relative Stereochemistry and 3D Models of Complex Organic Molecules from 2D NMR Spectra, Tetrahedron, 61, 9980-9989 (2005).
14) M. E. Elyashberg, K. A. Blinov, A. J. Williams, S. G. Molodtsov, and G. E. Martin, Are Deterministic Expert Systems for Computer-Assisted Structure Elucidation Obsolete? J. Chem. Inf. Model. 46, 1643-1656 (2006).
15) M. E. Elyashberg, K. A. Blinov, S. G. Molodtsov, A. J. Williams, and G. E. Martin, Fuzzy Structure Generation: An Efficient New Tool for Computer-Aided Structure Elucidation (CASE), J. Chem. Inf. Model., 47, 1053-1066 (2007). 10.1021/ci600528g
16) M. E. Elyashberg, A. J. Williams, and G. E. Martin. Computer-Assisted Structure Verification and Elucidation Tools In NMR-Based Structure Elucidation. Review article. Progress in NMR Spectroscopy (2007) 10.1016/j.pnmrs.2007.04.003
17) Y. D. Smurnyy, K. A. Blinov, T. S. Churanova, M. E. Elyashberg, and A. J. Williams. Toward More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comparison of Neural-Network and Least-Squares Regression Based Approaches, J. Chem. Inf. Model. 48, 128-134, (2008)
18) M. E. Elyashberg, A. J. Williams, D. C. Lankin, G. E. Martin, J. Porco, W. F. Reynolds, and C. Singleton, Applying Computer-Assisted Structure Elucidation Algorithms for the Purpose of Structure Validation – Revising the NMR Assignments of Hexacyclinol, J. Nat. Prod., 71, 581-588 (2008).
19) M.E. Elyashberg, K.A. Blinov and A.J. Williams, A Systematic Approach for the Generation and Verification of Structural Hypotheses. Magn. Reson. Chem. 47, 371-389, (2009)
20) M. E. Elyashberg, A. J. Williams, and K.A. Blinov, The Application of Empirical Methods of 13C NMR Chemical Shift Prediction as a Filter for Determining Possible Relative Stereochemistry. Magn. Reson. Chem. 47, 333-341 (2009)
21) Y. D. Smurnyy, K. A. Blinov, T. S. Churanova, M. E. Elyashberg, and A. J. Williams. Development of a fast and accurate method of 13C NMR chemical shift prediction. Chemometrics and Intelligent Laboratory Systems, 97(1), 91-97, (2009)
22) M. E. Elyashberg, A. J. Williams and K. A. Blinov, Structural revisions of natural products by Computer Assisted Structure Elucidation (CASE) Systems, Nat. Prod. Rep., 2010, DOI: 10.1039/c002332a
23) Blind trials of computer-assisted structure elucidation software, Journal of cheminformatics 4 (1), 5, A Moser, ME Elyashberg, AJ Williams, KA Blinov, JC DiMartino
24) Elucidating ‘undecipherable’chemical structures using computer‐assisted structure elucidation approaches, Mikhail Elyashberg, Kirill Blinov, Sergey Molodtsov, Antony Williams, Magnetic Resonance in Chemistry, 50(1), 22–27, 2012 DOI: 10.1002/mrc.2849
BOOK: Contemporary Computer Assisted Approaches to Molecular Structure Elucidation by Kirill Blinov, Mikhail Elyashberg and Antony J. Williams, Royal Society of Chemistry
Second talk delivered today at ACS Philadelphia…
Mining public domain data as a basis for drug repurposing
Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.
This is one of my presentations at the ACS meeting today in San Diego regarding how to use social networking tools to expose yourself as a scientist
Social networking tools as public representations of a scientist
The web has revolutionized the manner by which we can represent ourselves online by providing us the ability to exposure our data, experiences and skills online via blogs, wikis and other crowdsourcing venues. As a result it is possible to contribute to the community while developing a social profile as a scientist. At present many scientists are still measured by their contributions using the classical method of citation statistics and a number of freely available online tools are now available for scientists to manage their profile. This presentation will provide an overview of tools including Google Scholar Citations and Microsoft Academic Search and will discuss how these are and other tools, when integrated with the ORCID identifier, may more fully recognize the collective contributions to science. I will also discuss how an increasingly public view of us as scientists online will likely contribute to our reputation above and beyond citations.