Text mining is a broad category of techniques for investigating large bodies of text at once.
Data mining—from which text mining arose—was originally defined as any analysis of data that began without a hypothesis. It has come to mean potentially any computational analysis of data, with or without a hypothesis. Text mining has followed this definition.
For more information on text mining and its common methods and tools, see the Research Guide on Text and Data Mining.
Voyant Tools is a quick, in-browser way to see frequently occurring—or even co-occurring—words and phrases in a body of text.
Using Voyant Tools, I generated a word cloud of the most frequently used words on this page, including "library," "mining," "text," and "data." What do you notice about the results?
This project shows the result of topic modeling (the generation of "topics," or sets of words that frequently appear together, from a large body of text) on more than a hundred years of Vogue magazine, performed by the Yale DHLab.
Cameron Blevins, a scholar at the University of Colorado Denver, used topic modeling to better understand 27 years of daily diary entries by Martha Ballard, a midwife in the United States in the late eighteenth and early nineteenth century.
For help with any stage of a digital humanities project, with any of the methods described here, or with any other questions, feel free to reach out or book a consultation with the DHLab.