Skip to Main Content

Resources for Digital Humanities: Text Mining

Vocabulary, tools, advice, and library resources to start your Digital Humanities project or research.


Text Mining

Text mining is a broad category of techniques for investigating large bodies of text at once.

Data mining—from which text mining arose—was originally defined as any analysis of data that began without a hypothesis. It has come to mean potentially any computational analysis of data, with or without a hypothesis. Text mining has followed this definition.

For more information on text mining and its common methods and tools, see the Research Guide on Text and Data Mining.


A word cloud of the most frequently used words on this page, including

Voyant Tools is a quick, in-browser way to see frequently occurring—or even co-occurring—words and phrases in a body of text.

Using Voyant Tools, I generated a word cloud of the most frequently used words on this page, including "library," "mining," "text," and "data." What do you notice about the results?


Robots Reading Vogue: Topic Modeling

This project shows the result of topic modeling (the generation of "topics," or sets of words that frequently appear together, from a large body of text) on more than a hundred years of Vogue magazine, performed by the Yale DHLab.

Topic Modeling Martha Ballard's Diary

Cameron Blevins, a scholar at the University of Colorado Denver, used topic modeling to better understand 27 years of daily diary entries by Martha Ballard, a midwife in the United States in the late eighteenth and early nineteenth century.

Contact Us

For help with any stage of a digital humanities project, with any of the methods described here, or with any other questions, feel free to reach out or book a consultation with the DHLab.

Related Machine Learning Concepts