Archive for category Community Building
Data Mining Dissertations and Adventures and Experiences in the World of Chemistry
This presentation was given at the CLIR/DLF Postdoctoral Fellowship Summer Seminar at Bryn Mawr college in Pennsylvania on July 29th 2014. The intention was to communicate what we are doing in the fields of text and data mining in the domain of chemistry and specifically around mining the RSC archive publication and chemistry dissertations and theses. How would these experiences map over to the humanities?
Over the past few years I have learned how to use a lot of the social networking tools and platforms to host and share my publications (when I am allowed to), my presentations, videos etc. I have started using a new website, www.growkudos.com, to help me enrich, expose and measure my publications. This is VERY EARLY in my exposure and usage of the platform but I am already excited by the possibilities. I applied KUDOS to one of the articles I co-authored with Sean Ekins and Joe Olechno regarding “Dispensing Processes Impact Apparent Biological Activity as Determined by Computational and Statistical Analyses“. With almost 10,000 views it has become a very interesting article and has been discussed many times so there was a lot of online information to enrich the article with. The resulting KUDOS page is here: https://www.growkudos.com/articles/10.1371/journal.pone.0062325
This is a presentation I gave at the ACS Dallas meeting on March 19th 2014
Data enhancing the Royal Society of Chemistry publication archive
The Royal Society of Chemistry has an archive of hundreds of thousands of published articles containing various types of chemistry related data – compounds, reactions, property data, spectral data etc. RSC has a vision of extracting as much of these data as possible and providing access via ChemSpider and its related projects. To this end we have applied a combination of text-mining extraction, image conversion and chemical validation and standardization approaches. The outcome of this project will result in new chemistry related data being added to our chemical and reaction databases and in the ability to more tightly couple web-based versions of the articles with these extracted data. The ability to search across the archive will be enhanced as a result. This presentation will report on our progress in this data extraction project and discuss how we will ultimately use similar approaches in our publishing pipeline to enhance article markup for new publications.
The UK National Chemical Database Service as an integration of commercial and public chemistry services to support chemists in the United Kingdom
This is a presentation I gave at the ACS National Meeting in Dallas on Wednesday 19th March 2014
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.
The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms
This is my seventh and LAST talk at the ACS Meeting in Indianapolis:
The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms
The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.
This is my fifth talk at the ACS Indianapolis Conference:
The Social Profile of a Chemist Online – The Potential Profits of Participation
Unless a scientist is limited by their employer from exposing their scientific activities through publications and presentations, their future impact, whether expected to be at a bench, in front of an instrument or surrounded by robotics, will largely be represented online through their published works, their citation profile and other forms of recognition of their work by their peers. Search engines are already harvesting information about a scientist and aggregating into profiles such as those offered by Google Scholar Citations and Microsoft Academic Search. Rather than be limited to the online representation provided by such services students are encouraged to participate in the creation of their online profile and architect the representation of themselves online to as large a degree as possible to represent themselves to future employers and collaborators. This presentation will give an overview of potential approaches to participating in development of their online persona.
This is the third presentation I gave at the ACS Meeting in Indianapolis:
Personal experiences in participating in the expanding social networks for science
The number of social networking sites available to scientists continues to grow. We are being indexed and exposed on the internet via our publications, presentations and data. We have many ways to contribute, annotate and curate, many of them as part of a growing crowdsourcing network. As one of the founders of the online ChemSpider database I was drawn into the world of social networking to participate in the discussions that were underway regarding our developing resource. As a result of my experiences in blogging, and as a result of developing collaborations and engagement with a large community of scientists, I have become very immersed in the expanding social networks for science. This presentation will provide an overview of the various types of networking and collaborative sites available to scientists and ways that I expose my scientific activities online. Many of these activities will ultimately contribute to the developing measures of me as a scientist as identified in the new world of alternative metrics.
This is the second presentation I gave at the ACS Meeting in Indianapolis
Accessing chemical health and safety data online using Royal Society of Chemistry resources
The internet has opened up access to large amounts of chemistry related data that can be harvested and assembled into rich resources of value to chemists. The Royal Society of Chemistry’s ChemSpider database has assembled an electronic collection of over 28 million chemicals from over 400 data sources and some of the assembled data is certainly of value to those searching for chemical health and safety information. Since ChemSpider is a text and structure searchable database chemists are able to find relevant information using both of their general search approaches. This presentation will provide an overview of the types of chemical health and safety data and information made available via ChemSpider and discuss how the data are sourced, aggregated and validated. We will examine how the data can be made available via mobile devices and examine the issue of data quality and its potential impacts on such a database.
The future of scientific information & communication presented at the SUNY Potsdam Academic Festival
This is a LONG presentation….I talk about the “It’s All About Me” attitude that can positively feed science….we want to share OUR science, we want people to know about our opinions, our activities, our collaborators, we want to get funding, recognition and attribution. And why not…it can all be to the benefit of science.
This presentation was given at the SUNY Potsdam Academic Festival
The future of scientific information & communication
Our access to scientific information has changed in ways that were hardly imagined even by the early pioneers of the internet. The immense quantities of data and the array of tools available to search and analyze online content continues to expand while the pace of change does not appear to be slowing. While scientists now have access to the enormous capacities and capability of the internet the vast majority of scientific communication continues to be through peer-reviewed scientific journals. The measure of a scientist’s contribution is primarily represented by their publication profile and the citations to their published works and offers an incomplete view of their activities. However, we are at the beginning of a new revolution where the ability to communicate offers the opportunity to embrace new forms of publishing and where scientific participation and influence will be measured in new ways. This presentation will provide an overview of our new generation of “openness” in which open source, open standards, open access and open data are proliferating. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
The story about Olympicene was released earlier this week to great fanfare online. I discussed the details here. There has been so much press with comments made online on the websites of Popular Science, Scientific American, BBC News, the Huffington Post and many others that I wondered whether it would be appropriate to suggest an article get written for Wikipedia.
Now, I am VERY concerned with notability on Wikipedia, as evidenced by my post here about notability. I think Olympicene is “notable”. I am also concerned with being flagged with conflict of interest as I was involved with the Olympicene project. My intention was to ask the community to participate in writing the article. However, after checking Wikipedia I was happy to find that the community already got to it. In two days 9 authors had already worked on the article! I checked the View History logs…I don’t know who ANY of them are….so I am not in conflict of interest there either by asking someone to write it!
The Wikipedia article on Olympicene is here. It would be ideal to get the color image up there for dramatic effect as well as the original concept details when Olympicene was introduced, as well as at least some representatives of the synthetic path on ChemSpider SyntheticPages. I can’t add them…I’ll get flagged probably. I’d also link to the ScientistsDB article about Graham Richards as it is much richer than the one on Wikipedia.
Either way…from release to Wikipedia same day and nine authors in two days. Now that’s community collaboration!