Sunday, March 2, 2014

New PLOS Publishing Requirements Aim to Advance Data Sharing Practices

PLOS (the Public Library Of Science) is a popular scientific journal publisher whose journals include PLOS Genetics, PLOS Pathogens, and of course, PLOS ONE.  What makes PLOS stand out is not that they publish great science (which they do, of course), but rather their leadership in open access publishing (open access means that anybody can read their publications for free).  Recently PLOS announced that they will be taking their open access policies to the next level by requiring all published data to be openly and clearly accessible to the public.  Specifically their blog stated that "authors must make all data publicly available, without restriction, immediately upon publication of the article".  This has already sparked some important conversations about the feasibility of such a requirement.

Before we discuss this further, the idea of publishers requiring publicly available published data is not new.  Currently, most publishers already require clinical trial information to be available on the website before the data can be published.  When our research group published a paper describing a new species of bacteria, we had to deposit an isolate of the bacteria in three different archives before it could be published (see paper below in works cited).  Finally, most publications involving DNA sequencing (in my experience these are microbiome related studies) will have their sequencing data archived and made publicly available, and this is required by granting agencies like the NIH (in fact, the NIH requires many types of data sharing).  In fact, the general rule-of-thumb is that anybody should be able to contact you about your publications, whether it be about the methods, results, or whatever, and you should accommodate them to the best of your abilities.

So if this is already common, what is the issue?  As PLOS points out in their blog summary, there are a few key issues, including concerns about where data should be stored and whether competing groups will benefit unfairly from the publication of one research group's dataset.  As far as the 'unfairly benefiting from a published dataset' issue goes, the concern is that one group might spend a lot of time and money generating a large dataset of something like DNA sequences, analyze the data, and publish their findings along with the dataset.  Their worry is that somebody else might come along and analyze the data in a different way and find interesting things the first group missed, thereby benefiting from data the first group had to collect.  This might leave the first group, who collected the data, feeling cheated out of a finding what should have been theirs.

I don't think this should be a concern and I think many scientists will agree with me.  Without going into it too much (this could be a deep topic to cover), collaboration and knowledge sharing are important foundations of our scientific community, and this cooperation allows us to continue to advance.  As scientists, we all continue to benefit from published findings, whether it includes a large dataset or not.  The publication requirement from PLOS is definitely a step in the right direction and will really help the scientific community move forward because, "if we are to see further, it is because we have stood on the shoulders of giants" (a paraphrase from Newton).

The other concern is about where the data should be stored.  There are certainly numerous repositories for almost any kind of data, but sometimes datasets can be especially large and special circumstances will arise.  It sounds like PLOS is willing to work with the authors to find a solution to this type of problem, and perhaps they will even provide resources to authors who might not have otherwise published their data because they felt they were unable to properly store it.  I think that this is also a step in the right direction.

Finally, I hope the new PLOS policy results in more scientists publishing their analysis scripts, like R scripts that were used for analyzing big sets of data.  I don't see this often, and I have not even done it myself (although our published scripts are freely available, just shoot us an email), but a colleague of mine (Brendan Hodkinson) is really good about doing this, and it is really beneficial for two reasons.  First, seeing the specific codes used for analysis provides you with insight into the methods beyond what you can get from the 'methods' text from a paper.  Reading a code gives you insight into the important specifics of what was done.  Second, you can email the authors to get the codes used, and they should give them to you, but sometimes this process can take time (maybe the corresponding author is out on vacation or sick for a week, and unable to be reached?) and it might have just been easier if they were published together in the first place.  It is worth thinking about, and worth implementing, so I hope we see more of these kinds of practices as a result of the newest PLOS publishing policies.

Questions, comments, or concerns?  Please leave a comment or shoot me an email.  I would love to hear from you and I always want to learn more.

Works Cited

Hannigan, G., Krivogorsky, B., Fordice, D., Welch, J., & Dahl, J. (2012). Mycobacterium minnesotense sp. nov., a photochromogenic bacterium isolated from sphagnum peat bogs INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 63 (Pt 1), 124-128 DOI: 10.1099/ijs.0.037291-0


Data sharing stock image source

Code stock image source

No comments:

Post a Comment