Archive for category General Communications

A TERRIBLE implementation of Name Searching on ACS Journals

Yes, I am a Williams. And THAT is an incredibly common surname. But I am an Antony Williams, notice no H in the name, i.e. NOT Anthony. In the field of chemistry there are not many of us around…a couple I know of, but not many overall. Google Scholar does an extremely good job of automatically associating my newly published articles with my Citations profile here: https://scholar.google.com/citations?user=O2L8nh4AAAAJ

The last five articles automatically associated with my profile. I do NOT make any associations manually at this point.

The last five articles automatically associated with my profile. I do NOT make any associations manually at this point.

I am assuming that this is done by understanding the type of work I publish on, some of the co-author network maps that have been established as my profile has developed etc. I assume that there approach is very intelligent relative to some of the more commonplace searches that have been implemented….certainly the results are GOOD.

I noticed one disastrous example today when our article “ChemTrove: Enabling a Generic ELN to Support Chemistry Through the Use of Transferable Plug-ins and Online Data Sources” was published on the Journal of Chemical Information and Modeling here. Right there to the left of the abstract is an offer to look at other content by the authors.

Look for related content by the authors on JCIM

Look for related content by the authors on JCIM

I was interested to see what else ACS knew about my content so I clicked on my name…which performed this search: http://pubs.acs.org/action/doSearch?ContribStored=Williams%2C+A  and provided me with 96 articles by Andrew Williams (mostly), by Aaron Williams, by Anthony Williams (not me) and Allan Williams (to name a few). Eventually I managed to find 3 that were associated with me by searching the list for Antony Williams but none of those I published as Antony J. Williams were recovered.

Also, my colleague Valery Tkachenko is listed as an author with a misspelling as Valery Tkachenkov. What is simply inappropriate in my opinion is how the process involved taking the list of our submitted names..copied below directly from the submitted manuscript and changing them to their own interpretation of how we would want to see our names listed.

From this:

Aileen E. Day*†, Simon J. Coles, Colin L. Bird, Jeremy G. Frey, Richard J. Whitby, Valery E. Tkachenko§, Antony J. Williams§

To This:

Names changed from the original manuscript to those produced at submission

Names changed from the original manuscript to those produced at submission

Notice that for Aileen and Jeremy the middle initials were expanded, Colin had his middle initial changed from L. to I.,  Richard, Valery and I had our middle initials dropped and Valery had a v added to his surname. Why not simply copy and paste the names from the manuscript?

I will point out that this is a “Just Accepted” manuscript and likely the changes in names will be caught and edited, especially now I have just pointed them out. “Just accepted” does have some disclaimers:

The disclaimers regarding Just Accepted manuscripts

The disclaimers regarding Just Accepted manuscripts

While they can edit the names to match what we originally provided I don’t think it will fix the issue regarding finding all of my articles on ACS journals as when  navigated to one of my other articles here, http://pubs.acs.org/doi/abs/10.1021/es0713072, and did the search from my listed name it found exactly the same 96 hits.

Maybe a thought to use my ORCID profile http://orcid.org/0000-0002-2668-4821 to look for ACS journal articles associated with my name?

Unfortunately the data is already out in the wild as when I claimed the article on Kudos all of the name spelling issues had clearly spilled over via the DOI: https://www.growkudos.com/articles/10.1021%252Fci5005948

Names transferred via DOI to the Grow Kudos Platform

Names transferred via DOI to the Grow Kudos Platform

Ah…the things that surprise me….or not.

No Comments

Name disambiguation, ORCIDs and author IDs for Science Books

Those of you who follow my blog will know that I am a fan of ORCIDs and it is great to hear that there are now over a MILLION ORCIDS issued! The sooner the better as far as I am concerned that I can start claiming all of my books and book chapters against MY ORCID and then moving that information to other platforms. My Amazon Author Page is here: http://www.amazon.com/Antony-J.-Williams/e/B004YRPRV2 and I am glad to say that despite the fact that there is a book called “I Hate Sex” with the author Antony J. Williams, exactly the spelling of my name, is NOT associated with me. Phew…

If we could start to make sure, somehow, that ORCIDs, or at least some form of AUTHOR IDs were utilized by all publishers and associated with books that are published (and listed on Amazon and Google Books) then maybe we wouldn’t have this problem listed below….

My GREAT FRIEND Gary Martin (and often times mentor in NMR) and I are editing a two volume series with David Rovnyak. Volume 1 is listed on Amazon here and Volume 2 is here. Now then…Gary is rather well known in the world of NMR….his Wikipedia page is here. On Amazon his skill set is listed as under “About the Author” as:

“Gary E. Martin graduated with a B.S. in Pharmacy in 1972 from the University of Pittsburgh and a Ph.D. in Pharmaceutical Sciences from the University of Kentucky in 1975, specializing in NMR spectroscopy. He was a Professor at the University of Houston from 1975 to 1989, assuming the position of Section Head responsible for US NMR spectroscopy at Burroughs Wellcome, Co. in Research Triangle Park, NC, eventually being promoted to the level of Principal Scientist. In 1996 he assumed a position at what was initially the Upjohn Company in Kalamazoo, MI and held several positions there through 2006 by which time he was a Senior Fellow at what was then Pfizer, Inc. In 2006 he assumed a position as a Distinguished Fellow at Schering-Plough responsible for the creation of the Rapid Structure Characterization Laboratory. He is presently a Distinguished Fellow at Merck Research Laboratories.”

So HOW interesting to see who Google Books thinks he is! See the link here… it reads as

“Gary Martin’s career as a freelance comic book artist spans over twenty years. He’s worked for all the major companies, including Marvel, DC, Dark Horse, Image, and Disney, and on such titles as, Spider-man, X-men, Batman, Star Wars, and Mickey Mouse. Gary is best known for his popular how-to books entitled, ‘The Art of Comic Book Inking’. Recently, Gary wrote a comic book series called ‘The Moth’, which he co-created with artist Steve Rude.”

I am not listed as an editor and for sure the information is out of date since David Rovnyak joined as an editor this year.

googlebooks

This is Gary Martin, the inker.

So…I am very interested in any hypotheses regarding how Google Books picked up a comic inker as an author when Amazon lists Gary as a scientist, clearly. By the way, Gary Martin, NMR spectroscopist extraordinaire is a brilliant photographer, especially of lighthouses…but manipulates light…not ink.

Imagine, if you would, the potential power of ORCIDs in keeping this clear, platform to platform, if the publisher used them, if Amazon adopted them and if Google Books used the data. With time…

 

 

No Comments

Converting Crystal Structures into 3D Printable Files

We have been working with Vincent Scalfani from the University of Alabama towards supporting a community of 3D printing crystal structure enthusiasts. There is a listserv, [3DP-XTAL] hosted by the university of Alabama and if you would like to be added to the listserv, simply email Vincent at vfscalfaniATuaDOTedu. They are also in the process of creating a 3D printing crystal structure wiki/blog for the community.

With Vincent as the driver we are creating a public on-line repository for 3D printable structure files (.stl and .wrl). He used Jmol to prepare ~30,000 molecules and solids in .wrl and .stl format and we will be hosting them on part of our data repository.  We are very excited about this project and there will be more information at the upcoming 248th American Chemical Society Meeting in San Francisco, CA. See CINF Abstract # 125.

The flier that will be distributed at the IUCr meeting in Montreal in August is available on Slideshare here:

No Comments

What LinkedIn Contacts Think I Know…

There has been a new capability on LinkedIn for awhile….the ability to add your judgments about people you are LinkedTo in your network. What this looks like is shown below. LinkedIn2 It’s been interesting to see what I have been “endorsed” for on my profile

 

Linked_In

I would agree…I am a Chemist first, then an NMR Spectroscopist but I would put cheminformatics and analytical chemistry above Drug Discovery.   I DO like this approach for “tagging” skillsets though and I can see it has a natural role in other ways…a fun project for the New Year. Watch this space.

1 Comment

Google Scholar Citations continues to impress

I continue be impressed with Google Scholar Citations. I receive regular emails, similar to the one below,telling me when papers are referencing articles I have authored/co-authored. In this case this article referred to a paper that I co-authored in 1996 while I was at Kodak….regarding silver-catalyzed cyclizations. I would not have expected a paper about photographic based organic chemistry to show up in a Toxicology journal. But thanks to Google Scholar Citations now I know…

No Comments

Social networking tools as public representations of a scientist

This is one of my presentations at the ACS meeting today in San Diego regarding how to use social networking tools to expose yourself as a scientist

Social networking tools as public representations of a scientist

The web has revolutionized the manner by which we can represent ourselves online by providing us the ability to exposure our data, experiences and skills online via blogs, wikis and other crowdsourcing venues. As a result it is possible to contribute to the community while developing a social profile as a scientist. At present many scientists are still measured by their contributions using the classical method of citation statistics and a number of freely available online tools are now available for scientists to manage their profile. This presentation will provide an overview of tools including Google Scholar Citations and Microsoft Academic Search and will discuss how these are and other tools, when integrated with the ORCID identifier, may more fully recognize the collective contributions to science. I will also discuss how an increasingly public view of us as scientists online will likely contribute to our reputation above and beyond citations.

No Comments

Adding SORD Database (Selected Organic Reactions Database) to ChemSpider

As discussed over on the ChemSpider blog we will soon be depositing data from the SORD databases (Selected Organic Reactions Database) onto ChemSpider. This will be done as two separate but related datasets until the SORD data source: Reactants and Products. If you don’t know what SORD is then who better to explain than Dick Wife, the “host” of the SORD database. Dick wrote the overview article below to provide an overview about what SORD is…ENJOY!

The Selected Organic Reactions (SOR) Database: capturing “Lost Chemistry”

Dick Wife, SORD B.V. The Netherlands (www.sord.nl; dick.wife@sord.nl)

A new database is capturing the 80% of Lost Chemistry from theses and dissertations which doesn’t make it into publications and chemists who contribute their data get access to the entire database for free.

SORD, an independent Dutch company, is carefully selecting the synthetic chemistry focused on Life Science research and making this chemistry available in their Selected Organic Reactions (SOR) Database. For the theses/dissertations which they select, SORD excerpts all of the reactions in the Experimental section are excerpted. This means there will still be a small overlap of data with full publications. There will also be a larger overlap with publications such as Notes, Letters or Communications but these do not contain the experimental details. The SOR Database brings all this chemistry to the desktop, every last detail written by the author.

Some time back, SORD looked at around 300k interesting drug-like compounds in the literature and which countries they had come from, and the native language. The English-speaking countries accounted for only 37% of the total. German/Swiss dissertations are often written in English but this is new. The theses and dissertations in the other languages represent more than half of the total. SORD routinely translates German and French experimental texts into English. They are about to start on Chinese and Japanese translations and, if anyone can give them access to Russian theses, they will translate these as well!

A thesis or dissertation is the result of several years of hard work by a research student under the constant supervision of the research leader whose reputation is at stake if the work described is wrong or inaccurate. It is also examined by a committee who decide on awarding the degree, or not. They scrutinize closely  the Results & Discussion as well as the Experimental sections. The chemistry is reliable.

Advanced Chemistry Development, Inc (ACD/Labs) is partnering SORD in developing this Database. The SOR Database is available for in-house use with ChemFolder Enterprise or on the Internet with ACD/Web Librarian™. This is a screen-shot of a typical SOR Database record in Web Librarian.

 

The Reaction Scheme shows every atom (there are no abbreviations). The Experimental text is edited to ASCII format and the key parameters (Reagent(s), Solvent(s), yield(s), MP(s) and Optical Rotation(s) are displayed in separate Fields, as are the full bibliographic data, making data-mining possible. There is also a link which enables the user to bring up the PDF of each reaction, containing all of the spectral and other physical data which SORD does not excerpt. The PDF link is a powerful and unique feature of the SOR Database.

Now some explanation about SORD’s excerption rules. What they call the Reaction Scheme (A + B à C, etc.) contains only the reacting and product compound structures. A Reagent is an essential reaction component of which no part ends up in the product – if it does, it becomes a Reactant! When several reactions are performed before the product is isolated (and characterized) the Reagents and Solvents are listed in Steps. Failed reactions are not excerpted but reactions with poor yields are.

The SOR Database currently contains 170k reactions; the target is one million at the end of 2013. Even this number is a lot smaller than what you find today in the major commercial reaction databases. Back in the nineties, SORD researchers looked at one such large commercial database which then contained 9 million compounds. Sifting through the content for drug-like compounds resulted in just 450k or 5% of the records[1]. Size is one database metric; quality is much more important! In the SOR Database, you will only find characterized products – and no polymers, or compounds with no molecular structure.

Users of the SOR Database also have access to the separate databases which contain the Reagents (ca. 3,000) and Solvents (ca. 450) which have been encountered so far. Often a Reagent is a catalyst (organic/organometallic) but they can also be simple entities like bases, acids, ammonium salts, etc. or complex chiral ligands. Authors give Reagents many different names and so each Reagent (and Solvent) in the SOR Database has been assigned a unique name. This enables rapid searches using the assigned names, again a novel feature of the database. Such searches can bring you to really nice chemistry.

As an Example, the second generation Grubbs olefin metathesis catalyst has been given the name Grubbs 2 catalyst. In the current SOR Database, there are more than 500 reactions where it has been used. Some of these are straightforward; some are not and generate novel ring systems like this one from the Martin group at North Carolina at Chapel Hill:

Searches in the Reactions Scheme, or using Reagent/Solvent names and hit refinement brings you to new chemistry which until now was only found on a dusty shelf in a library. The “Lost Chemistry” is now getting smaller as SORD carefully selects and excerpts the reactions which deserve a new life. The SOR Database is essential for novelty searches and it is a powerful supplement for the other commercial reaction databases.

Finally some more good news for academic research chemists; your data will be readily accessible to the whole chemical world who will cite your work in their publications. The chemistry which you never published may be just what others are looking for. Routinely SORD excerpts the complete collection of theses and dissertations from research supervisors; they will be more than happy to see your work appear in the next SOR Database!


[1] de Laet, A.; Hehenkamp, J. J.; Wife, R. L. Finding Drug Candidates in Lost/Emerging Chemistry. J. Heterocycl. Chem. 2000, 37, 669–674.

No Comments

The long term cost of inferior database quality

Our recent Drug Discovery Today article

One of the highlights of the past year has been my continued collaborations with Sean Ekins on the issues of data quality, modeling of data and the applications of mobile technologies. Recently our commentary on the long term cost of inferior database quality was published in Drug Discovery Today and is available online here.

No Comments

Why am I suddenly so popular as a potential Open Access journal editor

I have become SOOOOOOOOO popular as a journal editor for Open Access journals. In the past week I have been invited to be a journal editor for three separate Open Access journals. These are simply emails with sign up here, we are a popular publisher of Open Access journals and a “editors are encouraged to submit articles message”. My favorite invitation this week is below. Don’t forget I’m a PhD CHEMIST, NMR spectroscopist and cheminformatician….

I chose NOT to respond…

Subject: Invitation to Join the Editorial/Review Board of Journal of Communication Technology and Human Behaviors

Dear Dr,

I am writing to introduce Journal of Communication Technology and Human Behaviors to you. Journal of Communication Technology and Human Behaviors is a new journal launched recently by the Columbia International Publishing (CIP) team. CIP is committed to rapidly delivering high-quality research findings and results to the world. We aim to make all CIP journals top publications in their fields.

Based on your outstanding scientific contribution to your field, the CIP team would like to invite you to join the Editorial/Review Board of Journal of Communication Technology and Human Behaviors

Print ISSN:           2163-128X

Online ISSN:       2163-1298

Journal link:        http://uscip.org/JournalsDetail.aspx?journalID=19

 

Acceptance of submissions to Journal of Communication Technology and Human Behaviors is based solely on decisions of the Editorial Board Committee and peer reviewers. If you are interested in serving on the Editorial Board committee, please send your CV to jcthb@uscip.org and indicate the position (Editor-in-Chief, Associate Editor, Regular Editorial Board Committee member, or Reviewer) you are interested in. CIP will make a selection based on the competition. To accept an Editorial/Review board position, you are required to agree to the terms and conditions given at the end of this invitation letter. The names of Editorial Board Members will be listed online and in print copies.

 

Only with contributions from Editorial Board Committee members and peer reviewers can we make Journal of Communication Technology and Human Behaviors a top journal. If you have any questions or suggestions, please do not hesitate to contact us. We are keenly looking forward to hearing from you.

Sincerely,

Editorial Team

 

Email: jcthb@uscip.org

Phone: 1-573-886-8964

Fax: 1-573-886-8901

 

Columbia International Publishing LLC

3610 Buttonwood Drive Suite 200

Columbia, MO 65201, USA

 

 

 

______________________________________________________________

 

Terms and Conditions of Editorial/Review Board Committee positions

1.   All Editorial/Review Board members should try to promote the journal as a top publication

in the field.

2.   This is a voluntary and honorary position. No payment from Columbia International Publishing is associated with this position.

3.   The term is typically two years. It is renewable with approval by the Administration Department of Columbia International Publishing.

4.   The Editor-in-Chief should send manuscripts to at least two experts in the field for review.

Editorial/Review Board members should provide timely, fair, objective, and professional comments on the manuscripts assigned by the Editor-in-Chief.

5.   Editorial/Review Board members should never disclose information pertaining to any manuscript under their review.

6.   Columbia International Publishing reserves the exclusive right to change any rules and terms and conditions without prior notice.

 

 

3 Comments

An InChIkey Collision is Discovered and NOT Based on Stereochemistry

InChI Strings and InChIKeys are very much the backbone of ChemSpider and have quickly become a way by which online databases are being connected online. The InChIKey is a hash of the InChiString and when the hash was adopted it was suggested that the likelihood that there would be a collision was very small, the estimate being, as quoted from the official InChI site:

“An example of InChI with its InChKey equivalent is shown below. There is a finite, but very small probability of finding two structures with the same InChIKey. For duplication of only the first block of 14 characters this is 1.3% in 109, equivalent to a single collision in one of 75 databases of 109 compounds each.”

At a previous ACS Meeting Prof Jonathan Goodman from University of Cambridge announced that he had identified a collision. The collision was for two isomers of spongistatin, a rather complex chemical structure with many stereocenters.

Jonathan has “done it again”…what a troublemaker he is (in a supremely gentlemanly way!). I was fortunate enough to receive the news about this collision from him just as I was getting on the flight from ACS Denver to home tonight and asked his permission to blog it as it is both exciting and, I believe, quite surprising news. Why? In this case the collision is for two distinctly different chemicals with totally different formulae and with NO stereochemistry! Very surprising!

As you can see in the figure below the two chemical compounds are simply long branched alkyl chains, one an alcohol and one a ketone.

In case Jonathan’s software tool that he was using to connect to the InChI generation software was doing something untoward with the molfile I confirmed the observation myself by drawing the structures in ACD/ChemSketch and generating the InChIKeys there. And, sure enough…I see exactly the same Standard InChIKeys for both molecules as shown in the movie below. VERY interesting!

 

, , ,

13 Comments