Yale University Library Research Guides: Resources for Digital Humanities: Text Mining

Resources

TDM Research Guide
For more information on text mining, see the Library Research Guide on Text and Data Mining. This link takes you to the page on commonly used methods in text and data mining.
Text and Data Mining Cube
After sending the DHLab and email or booking a consultation, you can reserve time in our Text and Data Mining Cube to try out text mining, with or without a project in mind.

Text Mining

Text mining is a broad category of techniques for investigating large bodies of text at once.

Data mining—from which text mining arose—was originally defined as any analysis of data that began without a hypothesis. It has come to mean potentially any computational analysis of data, with or without a hypothesis. Text mining has followed this definition.

For more information on text mining and its common methods and tools, see the Research Guide on Text and Data Mining.

Illustration

Voyant Tools is a quick, in-browser way to see frequently occurring—or even co-occurring—words and phrases in a body of text.

Using Voyant Tools, I generated a word cloud of the most frequently used words on this page, including "library," "mining," "text," and "data." What do you notice about the results?

Examples

Robots Reading Vogue: Topic Modeling

This project shows the result of topic modeling (the generation of "topics," or sets of words that frequently appear together, from a large body of text) on more than a hundred years of Vogue magazine, performed by the Yale DHLab.

Topic Modeling Martha Ballard's Diary

Cameron Blevins, a scholar at the University of Colorado Denver, used topic modeling to better understand 27 years of daily diary entries by Martha Ballard, a midwife in the United States in the late eighteenth and early nineteenth century.

Contact Us

For help with any stage of a digital humanities project, with any of the methods described here, or with any other questions, feel free to reach out or book a consultation with the DHLab.

Related Machine Learning Concepts

Natural Language Processing
Many text mining methods fall under the umbrella of Natural Language Processing (NLP).
Algorithmic Bias
Text mining methods can, as any other computational (and especially machine-learning-based) methods, possess unintended algorithmic bias.
Sentiment Analysis
Sentiment analysis is a machine-learning-based text mining technique.
Topic Modeling
Topic modeling is a machine-learning-based text mining technique.

Resources for Digital Humanities: Text Mining