Digitization is, broadly speaking, the generation of a faithful digital representation of some object or piece of information. In digital humanities terms, this most often means the scanning of archival documents.
You might digitize materials in order to share them more broadly, to conduct further research using digital techniques (e.g., using optical character recognition (OCR) to convert them to machine-readable text and then applying text mining techniques to them), or to preserve ephemeral information.
If you have physical materials you'd like to digitize, the DHLab offers a Digitization Cube with equipment for your use, including a large flatbed scanner, an overhead book scanner, and a microfilm scanner. After a consultation with DHLab staff, you can book time in the Cube.
OCR, optical character recognition, is the automated conversion of images of printed text to machine-readable and -searchable text.
You might, for example, OCR scans of printed books or typed letters so you can search or analyze their content.
There are two main pieces of software you might wish to use to perform OCR:
HTR, Handwritten Text Recognition, is the automatic conversion of pictures of handwritten manuscripts to machine-readable and -searchable texts. This is more specialized than OCR, given the variability of handwritten letters.
There is software that may be able to help with HTR:
For help with any stage of a digital humanities project, with any of the methods described here, or with any other questions, feel free to reach out or book a consultation with the DHLab.