Yale University Library Research Guides: Research Data Management: Organize & Document Data

What is Data Organization?

Data organization is how you organize the storage of your data. This can involve your filing strategy (folder / directory structure) as well as version control and management. How you organize (and later find) your data can have significant impacts on research efficiency and collaboration, and often has downstream affects on data documentation, storage, sharing, and preservation.

Using LabArchives

LabArchives is a cloud-based electronic lab notebook (ELN), licensed by Yale and free for those with a Yale NetID to use.

Key features:

Store and organize your research data — up to 1TB per notebook — online.
Share notebooks across teams, even with external colleagues.
Create standard notebook formats and templates for your lab or research group.
Integrate your notebook with other services, such as Microsoft and Google products, Canvas, and many more.
Access all revisions of notebook entries.

How to get started:

Get started by logging in with your NetID.

Learn more at the LabArchives Help Pages. For Yale-specific assistance with LabArchives, email labarchives@yale.edu.

Organizations related to data management and preservation

BRDI [Board on Research Data & Information]
The Board on Research Data and Information (BRDI) maintains surveillance of the field and proposes initiatives that might be undertaken at the National Research Council (NRC), targeted at challenges of national and international significance of particular interest to the board's sponsors. The Board engages in planning, program development, and administrative oversight of projects launched under its auspices.
CASC [Coalition for Academic Scientific Computation]
CASC is dedicated to advocating the use of the most advanced computing technology to accelerate scientific discovery for national competitiveness, global security, and economic success, as well as develop a diverse and well-prepared 21st century workforce.
CODATA
The mission of CODATA is to strengthen international science for the benefit of society by promoting improved scientific and technical data management and use.
DCC [Digital Curation Centre]
The Digital Curation Centre (DCC) is a world-leading centre of expertise in digital information curation with a focus on building capacity, capability and skills for research data management across the UK's higher education research community.
The Digital Curation Centre provides expert advice and practical help to anyone in UK higher education and research wanting to store, manage, protect and share digital research data.
DDI [Data Documentation Initiative]
The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.
eScience Portal for New England
A centralized resource developed to meet the on-going educational, informational, and collaborative needs of New England science and biomedical librarians.
IASSIST [International Association for Social Science Information Services & Technology]
IASSIST is an international organization of professionals working in and with information technology and data services to support research and teaching in the social sciences.
Its 300 members are from a variety of workplaces, including data archives, statistical agencies, research centers, libraries, academic departments, government departments, and non-profit organizations.
RDAP [Research Data Access & Preservation]
RDAP13, the fourth annual Research Data Access and Preservation Summit, takes place April 4-5, 2013 in Baltimore, MD at the Baltimore Marriott Waterfront. RDAP13 is a two day Summit featuring expert panel presentations curated by our RDAP planning committee, an interactive poster session, and lightning talks. Themes include Institutional Repositories, Data Citation and altmetrics, Data Infrastructure, Linked Data and Metadata, and Data Use and Reuse.
Research Data Alliance
The Research Data Alliance is a organisation that aims to accelerate and facilitate research data sharing and exchange. The work of the Research Data Alliance will primarily be undertaken through its working groups. Participation in working groups, starting new working groups, and attendance at the twice-yearly plenary meetings is open to all.

What is Data Documentation?

Data documentation is how you describe your data — first for yourself and your research team, and later, more formally, to a broader community. Data documentation can be as a simple as a text document, or it can involve many interwoven applications and systems. Common data documentation methods include data dictionaries, lab notebooks, qualitative codebooks, etc. Data documentation also often involves using standardized naming and formatting conventions as well as data and metadata standards and ontologies.

Data documentation should capture the following elements:

How data was created or obtained (e.g., methods, instruments, units of measurement, software, etc. used)
When, where, why, and by whom data was created
What data variables mean
How data are organized and where they are stored
How (if) data have been transformed or altered

Data documentation is closely related to data organization, as data organization structures are often recorded in data documentation.

Describing data

Sharing data (with others or within a lab over time) is impossible without proper data documentation. "Metadata" is data about data. It's structured information that describes content and makes it easier to find or use. A metadata record can be embedded in data or stored separately. Any data file in any format can have metadata fields. In social science, this record is called the "codebook" or "data dictionary."

There are many metadata standards and which one is right for your data will depend on the type, scale, and discipline of your research project.

Some examples of metadata standards are:

Astronomy Visualization Metadata
Content Standard for Digital Geospatial Metadata (more on metadata from the FGDC)
Darwin Core
Data Documentation Initiative
Dublin Core
Ecological Metadata Language

For more examples, see the Research Data Alliance Metadata Directory.

If your field doesn't have a metadata standard (it may not be listed above) or if you just need a simpler system to keep track of data within your own lab, consider that there are three main types of metadata addressed by most standards:

descriptive: describes the resource for identification and discovery
structural: how objects are related or put together
administrative: creation date, file type, rights management

Also consider this advice from the UK Data Archive [pdf]:

Good data documentation includes information on:

the context of data collection: project history, aim, objectives and hypotheses
data collection methods: sampling, data collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used
dataset structure of data files, study cases, relationships between files
data validation, checking, proofing, cleaning and quality assurance procedures carried out
changes made to data over time since their original creation and identification of different versions of data files
information on access and use conditions or data confidentiality

At the data-level, documentation may include:

names, labels and descriptions for variables, records and their values
explanation or definition of codes and classification schemes used
definitions of specialist terminology or acronyms used
codes of, and reasons for, missing values
derived data created after collection, with code, algorithm or command file
weighting and grossing variables created
data listing of annotations for cases, individuals or items

Research Data Management: Organize & Document Data

What is Data Organization?

Using LabArchives

Organizations related to data management and preservation

What is Data Documentation?

Describing data

Site Navigation

Yale's Libraries