My friend and often collaborator gave a talk at Analytica Munich this week (wish I was there) and it was in regards to “Finding small molecules in big data”. I am fortunate to collaborate with Emma on many of the aspects of using cheminformatics approaches to interrogate, interpret and integrate data associated with mass spectrometry analyses and structure identification. It’s been an interesting year working on the challenges together.
Metabolomics and exposomics are amongst the youngest and most dynamic of the omics disciplines. While the molecules involved are smaller than proteomics and the other, larger “omics”, the challenges are in many ways greater. Elements are less constrained, there are no given “puzzle pieces” and there is a resulting explosion in terms of potential chemical space. It is impossible to even enumerate all chemically possible small molecules. The challenges and complexity of identifying small molecules even using the most advanced analytical technologies available today is immense. Current “big data” methods for small molecules rely heavily on chemical databases, the largest of which presently available contain ~100 million chemicals. Despite this large number, high resolution mass spectrometry (HR-MS) measurements contain tens of thousands of features, of which only a few percent can be annotated as “known” and confirmed as metabolites or chemicals of interest using these chemical databases. How can we find relevant small molecules in the ever increasing data loads? How can we annotate more of the unknown features in HR-MS experiments? This talk will present European, US and worldwide initiatives to help find small molecules in big data – from chemical databases to spectral libraries, real-time monitoring to retrospective screening. It will touch on the challenges of standardized structure representations, data curation and deposition. Finally, it will show how interdisciplinary communication, data sharing and pushing the boundaries of current capabilities can facilitate research efforts in metabolomics, exposomics and beyond. This abstract does not necessarily represent U.S. EPA policy.