↓ Expand ↓

Hexacyclinol–The Data Debate

We’ve noticed some buzz in the blogosphere lately (here, here, here, and here) over a recent Org. Lett. paper (DOI: 10.1021/ol900164a) revisiting hexacyclinol—a natural product that got a lot of attention back in 2006. The new paper was a reminder that James J. La Clair, the controversial figure in the hexacyclinol brouhaha, had said back in 2006 that he was going to duplicate his disputed total synthesis and republish his results with more spectral data.

La Clair joined the discussion about the Org. Lett. paper over at In the Pipeline, where commenters called for him to put the debate to rest by providing additional data. That got us thinking about the role that data plays in putting these kinds of debates to rest. The amount of data that ends up in a publication has a lot to do with what a journal requires, so we decided to learn what different journals look for in characterization and supporting information, and how has that changed with time.

Angewandte Chemie—the journal that published La Clair’s hexacyclinol synthesis—states in its Guidelines for the Preparation of Manuscripts: “The identity and purity of all new compounds must be fully characterized by appropriate analytical methods (e.g. NMR spectroscopy, X-ray crystal structure analysis, elemental analysis, etc.). These data should be given in the Supporting Information in the event that they exceed the scope of the Experimental Section.” Peter Gölitz, the journal’s editor, is away this week, and ACIE’s other editors preferred not to comment on the evolution of this policy in his absence. When we hear from him, we’ll post an update.

UPDATE 3/4/09: We corresponded with Peter Gölitz via email. Three updates from our conversation with Gölitz may be found further down in the text.

We’re also in the process of tracking down paper copies of ACIE issue 1 from 2005, the year in which La Clair’s paper was submitted, and/or issue 1 from 2006, the year it was published. That’s where the journal prints its Guidelines for the Preparation of Manuscripts. We’ll post an update once we have that information.

UPDATE 3/2/09: We’ve obtained ACIE’s Notice to Authors from 2005. The instructions for ACIE communications in 2005 are identical to the most recent guidelines, except that they are missing the “The identity and purity of all new compounds” statement mentioned above. You can read the 2005 Notice to Authors at the link below.

supporting-material-requirements-2005-angew.pdf

Peter Stang, Editor of JACS, tells us, “We look for a body of information that fully supports the claims made, and that information depends on the field.” That guiding philosophy hasn’t changed with time, but the particulars have evolved. For instance, when high-resolution mass spectrometry became widely available, it gradually became understood that it would be the new standard for the characterization of the mass of new compounds, Stang says.

The other journals we spoke with described similar policies, and offered more examples of how they evolve. “At Nature Chemical Biology, we request that authors avoid ‘data not shown’ statements,” says Terry Sheppard, Editor of the journal. The journal published an editorial in October 2008 that explains the call for added transparency. March 2009′s Nature Chem. Bio. editorial highlights another of the journal’s policies, which state that “Authors should provide a statement confirming the source, identity and purity of known compounds that are central to the scientific study, even if they are purchased or resynthesized using published methods.”

Organic Letters strives to adopt the practices set by other ACS journals publishing in its area- JOC and JACS, says Amos Smith, Org. Lett.’s editor. All three of those journals try to ensure that their standards are uniformly high, he adds. One practice that Org. Lett. inherited from JOC is a compound characterization checklist, which helps everyone involved with a manuscript keep track of all the NMRs, melting points, and more. The checklist isn’t required, Smith says, but “almost everybody” ends up submitting one. The data policy helps “assure as best you can ever assure that the quality of the work is at a high level,” Smith says. Easy access to reproductions of spectra and other supporting information wasn’t always available, he notes. “Over the past ten years, there has been an evolution in the types of data that people now can access to judge the quality of experimental work,” he says. In part, disclosing extensive data in a communication-length article has become routine in the publishing world because the evolution of technology on the web has made it easy to do so, he says.

UPDATE 3/4/09: “When I became the editor-in-chief in 1982, there was no such thing as Supporting Information,” Gölitz says. “In former times it was customary that a Full Paper follows a Communication, and experimental details where therefore expected to be delivered in the Full Paper,” he adds.

Incidentally, C&EN contacted La Clair for comment on the Org. Lett. paper, and he told us that since the 2006 controversy, he’s made some changes with regard to his data. “I now offer or upload FIDs to with my manuscript submissions,” he says. “Sadly, many journals do not provide them with their online content. Perhaps that will change with the expanse of digital communications so that arguments such as this one can evolve from being matters of personal defamation to ones of scientific inquiry.

“Before this incident, I did not store my FIDs and now I have over 25,000 of them backed up,” he added. “It’s time that spectral data become more digitally accessible. Often the PDF files are larger than the raw spectra in size. Why can we not download the FIDs and discuss the raw data? I can download HD movies from Netflix but not critical experimental data files from NIH supported research programs? In hindsight, this may be the best lesson I have learned from this saga. I do know that many other authors are afraid of what I went through – providing raw data files such as FIDs is a key next step in protecting our authors.”

Technology alone doesn’t drive the process, though. There’s a human element too. Reviewers and editors play an important role in the process of determining whether the information an author provides actually supports the claims they make, Stang says. “Suppose somebody submits a spectrum run on a 300 MHz NMR. A reviewer examining that data has the right to ask that the spectrum be run on a higher resolution NMR. The reviewer can say, ‘this isn’t clear, the author needs to better determine the coupling constants’. Likewise with mass spec.”

UPDATE 3/4/09: In addition to the characterization statement we described above, Gölitz says that ACIE will occasionally “request specific material based on the comments of a referee.”

So we’ll put it to you readers: What characterization information should be required for publication in your discipline? Does there need to be more transparency? What do you look for to indicate good data when you review a paper?

UPDATE 3/4/09: Gölitz cited a trend toward publishing fewer full papers in favor of communications. Do you agree that that’s the case in your field, and what do you think are the positive and negative effects that this trend has had on the community? Were we better off in the good old days?

For further reference:
JACS Guidelines for Authors[pdf]

Org. Lett. Guidelines[pdf]

Nature Chem. Bio. Guidelines

8 Comments

  • Feb 26th 200916:02
    by joel

    Similar to the idea of including a FID for NMR spectra, it’s common for crystal structure characterization to include a .cif file. This often can indicate the quality of the structural assignment.

    In the field that I work in (nanoscience/chemistry), often there is limited consensus over what constitutes full characterization of a sample. What constitutes a “nanomaterial” can vary greatly depending on surface chemistry, size distribution and synthetic method. Indeed, much of the current literature is focusing on understanding these new materials with a range of conventional characterization. In those cases, I would suggest that the authors strive to include a host of complementary characterization, keeping in mind factors such as the one listed above. For example, when the Murray group recently published a crystal structure of gold nanoparticles (doi 10.1021/ja800561b), they were building on previous work with high-res mass spec. data (doi 10.1021/ja071042r).

    Great post!

  • Feb 26th 200918:02
    by Paul Docherty

    I wish the ACS luck in getting anything (sensible) out of La Clair, or any reasoning for that lapse in standards at ACIEE towers, and it’s important to have a debate about data. However, it’s also important to describe a level playing field – take for example the total synthesis of Gonyautoxin 3 by Du Bois and Mulcahy. This was published in JACS last year (doi: 10.1021/ja805651g), and contains no descriptions of the experimental methods, either in the main paper or in the SI. Sure, there’s plenty of NMR data and spectra, but without the methods used to perform the synthesis, this is hugely disappointing.

    Standards are just that – but too often there seem to be ‘exceptions’. Recently, a friend had an Org. Lett. bounced because of ‘insufficient elemental analyses’ – explain that, when the afore mentioned JACS has none? Standards are standards…

    Submitting FIDs is an invasionary process, but perhaps worth the pain – after all, most researcher I know have been archiving them for the best part of a decade. However, this won’t stop fraudulent publication – take the example of the Sames retractions a few years back. In this case, the NMR *was* authentic – it’s just the means of preparation that wasn’t.

    Perhaps we should all wear CCTV cameras… that’s gonna happen in the UK soon ;)

  • Feb 26th 200918:02
    by Mitch

    Any scientist worth their weight in salt can fake a spectra. Raw data solves very little, it only ads a small time consuming step.

  • Feb 26th 200919:02
    by Carmen Drahl

    @joel-I was having discussions a while back about the Protein Data Bank… it would be interesting to have a routinely-used repository of that sort for FIDs.
    @Paul-Welcome.. don’t recall you commenting around these parts, though I could be wrong. It’s got to be hard to truly level anything where a human element is involved- for instance, one’s experience during quals or a defense will be committee-dependent to some extent.
    I think the best we can ask of a set of standards is that the community notices a visible improvement after they’re implemented- you’re absolutely right that FID’s won’t *stop* problems.
    @Mitch-I wonder whether just asking for more raw data is better because all of it has to be consistent to support a given set of conclusions. More requirements means that mistakes are more likely to pop up if any piece of the puzzle were being faked. Incidentally, I liked your choice of words. I’ve been working on reading “Salt” for the last month.

  • Feb 26th 200921:02
    by James La Clair

    If one looks hard I am certain one will find that the lack of congruity in reporting data is quite a large issue that goes further than my work or any select example that one can pick.

    Of course there is no magic bullet but it seems that providing raw data files would be much more effective if the appropriate lightweight software (browser) existed that could eliminate or at least reduce the time spent.

    It may make sense one day to have online XML type files that simply allow one to view the entire spectral data sets for materials using a plugin for ones web-browser.

    At least for one it could be made so that all journals shared the same format and authors no longer needed to spend hours creating data tables or writing out …1H-NMR (500 MHz, CDCl3): ? 7.39-7.25 (m, 5 H), 4.76 (s, 2 H), etc in their SI.

    While software like MestreNova exists for ready display. The file formats are simply to large to view online and said “browsers” are not cheap.

    Jim La Clair

  • Feb 27th 200905:02
    by Mat Todd

    Uploading and viewing raw NMR data is trivial: use small jcamp-dx files and view with jspecview. See here for an example:

    http://www.thesynapticleap.org/node/251

    …or any of Jean-Claude Bradley’s open notebook pages with numerous spectra at Usefulchem. The journals should embrace this and encourage authors to submit raw data. PDFs are so 2005.

  • Feb 28th 200916:02
    by Robert Bird

    1) The point of a journal article is to describe research so as to add to the pool of knowledge and so that others can reproduce it (if it can’t be tested, it isn’t science). Forcing full data reporting (NMR, mps, etc) and procedures has at least two uses. First, if the research is honestly presented, the data allows others to look and see if another interpretation is warranted, which its absence doesn’t allow – it allows people to test the knowledge gained. In this case, procedures ought to be mandatory, as well, because if no one can reproduce the paper, it (again) isn’t really science. If someone is dishonest, it forces them to do a lot more to maintain their illusions. Spectra can be faked (though the lack of impurity peaks, satellites, etc, make it harder to do so convincingly), but they do characterize a compound – for other data, rotations, HPLC retention times (and conditions), and mps would help, because although those numbers don’t correlate with a compound’s identity (that we know of), they provide data points to distinguish real from fake. (The Chaterjee “synthesis” was broken in part by differences in the cited mps of a key intermediate between what was originally claimed and what was found by other scientists.) If you force a liar to make his story more detailed, eventually he will make mistakes that can be found and which can be used against him. Raw data wouldn’t hurt, either. All of the data are necessary to do the reported work (and in most cases will take far less time to report than to have gotten in the first place), so I don’t really see a good excuse for not telling others of them.

    2) Hexacyclinol is probably the most blatant example of insufficient data reporting in a long time, hence the utility of its synthesis as a good example. If you can’t be a good example, at least you can be a terrible warning.

    3) I like reading total synthesis papers, but I wonder how much of a point there is in papers (or research) which are unlikely to ever be reproduced. Yields in multistep total syntheses are in some cases questionable (In The Pipeline had a post about this roughly two years ago). but who can tell whether they were right or wrong? If analogs are actually prepared, then that provides corroboration for data from the original paper, but few people are likely to prepare taxol or norzoanthamine (sic?) again, and almost certainly not by the same route (though intermediates may be held in common).

    4) Paul’s been around for a while writing about total synthesis – he has a blog about it.

  • Mar 24th 200912:03
    by Stephen

    There is no way JJLC can have 25,000 FIDs since 2006. A highly productive graduate student will only furnish ~3500 FIDs (1H, 13C, etc.) over 4-5 years. Our NMR technician doesn’t even come close. Unless he counts each pulse.

  • Leave a Reply


    + 4 = nine