Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
March 1: Bass Library reopened; Sterling and Bass resume evening and weekend hours. Yale Library COVID-19 updates.

Research Data Management: Data sharing & re-use

Resources for learning about best practices in research data management across a variety of disciplines.

Data citation

Data citation is an important component of data sharing and data reuse. Citing data gives data creators credit for creating and sharing their work, and creates a trail of research progress similar to the citation of articles and books.

These guidelines will also help you make sure that the data you generate and share is also citable by others.

Check with the journal you're publishing in to see if they have a data citation format recommendation. Many journals and citation styles don't specifically require you to cite research data, or they don't give you specific citation guidelines for a research data set. In this case, you should still cite data you use in your analysis and publications with these key elements:

  • Author(s)
  • Title
  • Year of publication: The date when the dataset was published or released (rather than the collection or coverage date)
  • Publisher: the data center/repository
  • Any applicable identifier (including edition or version)
  • Availability and access: URL or other location information for the data

DataCite is an international organization that helps researchers to find, access, and use data. Their recommended data citation format is:

  • Creator (PublicationYear): Title. Publisher. Identifier

It may also be desirable to include information from two optional properties, Version and ResourceType (as appropriate). If so, the recommended form is as follows:

  • Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

For citation purposes, DataCite recommends that DOI names are displayed as linkable, permanent URLs:

Learn more about DOI and other persistent identifiers.

Data preservation and archiving

Preservation of data is different from simple storage of data. For preservation purposes, data will be migrated from format to format as new storage models come into use, and the data's integrity will be maintained through the process. A good example of data preservation is the Inter-university Consortium for Political and Social Research which is a social science data archive containing thousands of data sets from all over the world back to the 1800's. Data in ICPSR has been and will continue to be properly managed to ensure access and usability of data over time.

Not many individual labs are equipped to preserve data for long-term use, so domain archives like ICPSR can be a good alternative. Several journals and funding agencies require data deposit into a repository (such as GenBank) for long-term reliable preservation.

There are hundreds of domain repositories. Some will accept only certain data funded by certain agencies, and others will accept data that fits their collection policy. re3data.org is a database of research repositories by discipline:

Check these out and see if a repository there matches the long-term home you envision for your data. Keep in mind that not every subject repository will accept your data and not every repository is suited for long-term preservation. If you need help identifying a suitable repository for your data, contact the Research Data Support Services group.

Why share research data?

Sharing data is now encouraged by major funding agencies, and many journals require it as a prerequisite for publication. The NSF specifically states:

Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.

In addition to funder requirements, data sharing is important because it can lead to a broader impact for your research and facilitate advances in science. Sharing your data in a subject repository will facilitate the sharing and re-use of your data.