Skip to Main Content

Research Data Management: Validate Data

Research data is loosely defined as information collected, observed, or created for purposes of analysis to produce original research. This guide provides resources for managing your research data no matter the discipline.

What is Data Validation?

Data validation is important for ensuring regular monitoring of your data and assuring all stakeholders that your data is of a high quality that reliably meets research integrity standards and also a crucial aspect of Yale's Research Data and Materials Policy, which states "The University deems appropriate stewardship of research data as fundamental to both high-quality research and academic integrity and therefore seeks to attain the highest standards in the generation, management, retention, preservation, curation, and sharing of research data."

Data Validation Methods

Basic methods to ensure data quality — all researchers should follow these practices:

  • Be consistent and follow other data management best practices, such as data organization and documentation
  • Document any data inconsistencies you encounter
  • Check all datasets for duplicates and errors
  • Use data validation tools (such as those in Excel and other software) where possible

Advanced methods to ensure data quality — the following methods may be useful in more computationally-focused research:

  • Establish processes to routinely inspect small subsets of your data
  • Perform statistical validation using software and/or programming languages
  • Use data validation applications at point of deposit in a data repository

Additional Resources for Data Validation

Data validation and quality assurance is often discipline-specific, and expectations and standards may vary. To learn more about data validation and data quality assurance, consider the information from the following U.S. government entities producing large amounts of public data: