Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Yale Library

Resources for Text and Data Mining: Home

Finding help

Yale researchers who need help getting started with a Humanities project, can visit the Digital Humanities Lab during office hours

For help with statistical analysis or projects in the Science or Social Sciences, contact the StatLab at the Marx Science and Social Science Library

Staff in these locations can work with you and a subject specialist to identify potential resources. 

Click here to browse all text data available through the library. 

Other helpful guides

About this guide

Text and data mining (TDM) are research techniques that use computational tools to identify and extract relevant information or patterns from large data sets or from text-based digital content.  

As the use of TDM for research gains popularity, a number of challenges are presented. There are legal, ethical and logistical issues that researchers must consider when selecting sources of text and/or data for analysis. This guide was developed to help Yale researchers identify resources in our collections that may be available to use for TDM projects. It also includes sources that are freely available online.

Things to keep in mind

Text and data mining is highly customized work, with varying timelines from start to conclusion.  To carry out a successful project, you will need both access to data and the skills to interact with that data.  What these skills entail depends on the data and what you want to do with it.

When starting a project, you need to consider:

  • What are the goals of my project?
  • What data sources are available that meet my needs?
  • What funding needs may this project incur?
  • What skills are needed to carry out this project? 

Appropriate use of licensed resources:

Most of the library's electronic resources are governed by license agreements that limit use to the Yale community or to individuals who are physically present at Yale University Library facilities.

  • Each user is responsible for ensuring that he or she uses these products solely for noncommercial, educational, scholarly or research use. Systematic downloading, distribution of content to non-authorized users or indefinite retention of substantial portions of information is strictly prohibited
  • The use of software such as scripts, agents, or robots, is generally prohibited and may result in loss of access to these resources for the entire Yale community.
  • The Yale University Appropriate Use of Technology Policy prohibits violations of these agreements.