Sharing data (with others or within a lab over time) is impossible without proper data documentation. "Metadata" is data about data. It's structured information that describes content and makes it easier to find or use. A metadata record can be embedded in data or stored separately. Any data file in any format can have metadata fields. In social science, this record is called the "codebook" or "data dictionary."
There are many metadata standards and which one is right for your data will depend on the type, scale, and discipline of your research project.
Some examples of metadata standards are:
For more examples, see the Research Data Alliance Metadata Directory.
If your field doesn't have a metadata standard (it may not be listed above) or if you just need a simpler system to keep track of data within your own lab, consider that there are three main types of metadata addressed by most standards:
Also consider this advice from the UK Data Archive [pdf]:
Good data documentation includes information on:
At the data-level, documentation may include:
Yale is working to meet the demand of researchers by offering high performance computing, long-term storage options, and secure back-up services. You may also have access to additional resources through your departmental infrastructure or through CSSSI's StatLab consulting services.
Here are some links to get started on computing, security, backup, and storage at Yale:
There are different options for different research needs. Contact your local IT specialist to learn more.
Preservation of data is different from simple storage of data. For preservation purposes, data will be migrated from format to format as new storage models come into use, and the data's integrity will be maintained through the process. A good example of data preservation is the Inter-university Consortium for Political and Social Research which is a social science data archive containing thousands of data sets from all over the world back to the 1800's. Data in ICPSR has been and will continue to be properly managed to ensure access and usability of data over time.
Not many individual labs are equipped to preserve data for long-term use, so domain archives like ICPSR can be a good alternative. Several journals and funding agencies require data deposit into a repository (such as GenBank) for long-term reliable preservation.
There are hundreds of domain repositories. Some will accept only certain data funded by certain agencies, and others will accept data that fits their collection policy. re3data.org is a database of research repositories by discipline:
Check these out and see if a repository there matches the long-term home you envision for your data. Keep in mind that not every subject repository will accept your data and not every repository is suited for long-term preservation. If you need help identifying a suitable repository for your data, contact the Research Data Consultation Group.