Why Open Data Should Matter to You

We hope you’ve enjoyed this month’s posts on “Open in Action” from the Scholarly Communications Office. We’ve talked about open access at Emory, the benefits of using OpenEmory, plus how to increase your scholarly impact and manage your author’s rights.

Today, we close our series by turning our attention to Open Data. There is an increasing emphasis in academic communities on following the principles of open data to ensure the longevity of scholars’ work. The data you collect or create in the course of your research can have lasting value—for yourself, for your colleagues, or for others who may use the data in ways that have yet to be conceived. Also, studies of data sharing have found that researchers who make their data accessible have a higher number of citations to their papers (Piwowar and Vision, 2013) and produce more publications (Pienta, Alter, and Lyle, 2010).

What makes data open?

The Open Data Institute defines good open data as data that:

  • can be linked to, so that it can be easily shared and talked about
  • is available in a standard, structured format, so that it can be easily processed
  • has guaranteed availability and consistency over time, so that others can rely on it
  • is traceable, through any processing, right back to where it originates, so others can work out whether to trust it
Why open data matters in the U.S.

Of course, a real impetus to open up data is the U.S. funding agency response to calls for public access to research. The National Institutes of Health, the National Science Foundation, and the National Endowment for the Humanities all have requirements for grant recipients to ensure long-term access to their scholarly works, including data. In response to the White House Office of Science and Technology Policy memo of 2013, several federal agencies have released their plans and are striving to bolster their policies for making research accessible to the public who funds it. SPARC and Johns Hopkins University have developed a community resource to explore and compare the funders’ policies.

Consider also the Cancer Moonshot mission, launched by Vice President Joe Biden this year to further U.S. efforts in cancer research. One of the tenets of this initiative is the open sharing of data within the research community, breaking down the barriers that trap data within silos at institutions and in labs. The Winship Cancer Institute convened a special summit at Emory in June 2016, with researchers, clinicians, and patients all providing insight and ideas to feed back to the Cancer Moonshot Task Force. According to Donald Harvey, Winship Phase 1 Unit Director: “If we have the ability to transfer money from our bank accounts to others via our phones, why can’t we take the same approach in data sharing for cancer care?”

How do you demonstrate your commitment to open data?
  • Make a plan. Whether or not your funder requires it, have a robust plan for where you will store, back up, and preserve your digital files and data.
  • Document your data for re-use. Consider using an accepted metadata standard for specific data types, or write your own README file that describes your data collection.
  • Choose non-proprietary and accessible file formats wherever possible.
  • Archive your data. There are thousands of research data repositories worldwide, and the re3data registry can help you find an appropriate one.

In the Scholarly Communications Office, we consult with researchers who are crafting data management plans or preparing their data for archiving. If your discipline does not have a commonly used data repository, we also support deposits to the Emory Dataverse, an open repository of Emory research data.

Have questions? Email the Scholarly Communications Office at scholcomm [at] listserv [dot] cc [dot] emory [dot] edu.

References:

Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175 Pienta AM, Alter GC, Lyle JA. (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307