Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Yale Library

Digital Initiatives : Glossary

Questions? Additions?

If you see a TERM that is not clear or needs refreshing or if you have a TERM you would like to add to the GLOSSARY, please send email to elizabeth.beaudin@yale.edu . Thanks!

Glossary

API

an application programmatic interface “is the interface that a computer system, library or application provides in order to allow requests for services to be made of it by other computer programs, and/or to allow data to be exchanged between them”

 

automatic document feeder

an attachable part that automatically sends multiple pages through the scanner

 

automatic document reader

a scanner that has the ability to process many documents

 

bento box 

Bento boxes are traditional Japanese lunch boxes which come in different sizes and shapes.  They have little containers that hold a variety of lunch time foods.  The organizational structure of a bento box is now used when designing a user interface on a web site, thus creating several box shapes to hold different yet related information. 

 

best practices

a method, process, or activity that is more effective at delivering a particular outcome than any other technique, method, process, etc., with fewer problems and unforeseen complications

 

born-digital

an asset that originated in digital form. some examples include: Websites, wikis, e-books, digital sound recordings, and email. 

 

cascading shylesheets

describe and customize the presentation, such as colors and fonts, of a document or a web page written in HTML

 

CCITT Group IV

an image compression schema based on the "Comité Consultatif International Téléphonique et Télégraphique"), a telecommunications standard created in 1956

 

checksum

a function used for validating data integrity. Also referred to as MD5 (Message-Digest algorithm 5). an algorithm or forumla is applied against the source (typically a file and its content, such as the image of a scanned page from a book) in order to generate a unique, 128-bit hash value often called a checksum. In digital preservation processes, the MD5 checksum from when the content was created is compared to another checksum created after the content has been received or stored over a period of time. The values are compared and, if they match, this indicates that the data (e.g. the scanned page image) is intact and has not been altered.

 

consortium/consortia

group of organizations with a common purpose to meet a goal that would normally be beyond the capabilities of a single member

 

copyright

exclusive rights regulating the use of a particular expression of an idea, materials, or information. In other words, "the right to copy" an original creation.

 

copyright holders

only the copyright holder – whether the person, estate, or representative -- is permitted to use the rights restricted by copyright; all others are prohibited from using the work or materials without the consent of the copyright holder

 

DC

Dublin Core is a metadata standard for cross-domain information resource description, created by OCLC, a library consortium, based in Dublin, Ohio.

 

digitization

digitization is the process or series of software programs used to make a representation of an object, an image, or a signal (when dealing with audio) by a discrete set of its points or samples. The result is usually called a digital image for the object, and digital form for the signal.

 

EAD

Encoded Archival Description is an XML standard for encoding archival finding aids, maintained by the Library of Congress in partnership with the Society of American Archivists.

 

fair use

the concept from United States copyright law that permits limited use of copyrighted material without requiring permission from the copyright holders, such as use for scholarly research or review.

 

Fedora

Flexible Extensible Digital Object Repository Architecture is a software framework to construct and maintain repositories of digital objects.

 

FTP

File Transfer Protocol

 

GNU

The GNU General Public License (GNU GPL or simply GPL) is a widely used free software license, originally written by Richard Stallman in 1984 for the GNU project.

 

IPR

Intellectual Property Rights – copyright information related to materials published and later held in libraries

 

ISO

International Organization for Standardization

 

JPG

Joint Photographic Experts Group – the name of the group that developed the standard.  JPG is a compression method for images.

 

JPG 2000

JPEG 2000 is a wavelet-based image compression standard. It was created by the Joint Photographic Experts Group committee in the year 2000 with the intention of superseding their original discrete cosine transform-based JPEG standard (created about 1991). The standardized filename extension is JP2 .

 

LZW

LZW is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.

 

MARCXML

MARC (MAchine-Readable Cataloging) is bibliographic and related information in machine-readable form; the standard was developed by Henriette Avram at the Library of Congress in the 1960’s. 

 

METS

an encoding standard for descriptive, administrative, and structural metadata regarding objects within a digital library, using XML schema language

 

MODS

a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications, to define elements in datasets often used in digital libraries

 

NISO

National Information Standards Organization

 

OAI

OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting)  Harvesters are software programs that search the Internet for metadata conforming to published OAI standards. 

 

OCR

Optical Character Recognition:  computer software designed to convert images of text (usually captured by a scanner) into machine-editable text

 

PDF

A file format, created by Adobe Systems, for document exchange in a manner independent of the application software, hardware, and operating system

 

Persistent URL

a Uniform Resource Locator that remains unique and intact regardless of its object’s location or state

 

Portal

usually sites on the Internet providing specially designed features for their visitors to provide services from a number of different sources.

 

PREMIS

Preservation Metadata: Implementation Strategies, a core set of preservation metadata, applicable across a wide range of digital preservation contexts and supported by guidelines.

 

repository

a central place where databases or files are located or distributed over a network, providing persistence of access and preservation of the digital objects

 

RFP

Request for Proposal: a request for a written bid from an outsourcing vendor

 

Sakhr

A software company based in Egypt active in the IT industry since the early 1980s.  Sakhr Automatic Reader is the OCR software specialized for converting Arabic text produced by Sakhr Software Company.

 

TIFF

Tagged Image File Format -- acknowledged as the best format for preservation and technical longevity

 

TXT

A filename extension for files consisting of text usually contain very little formatting

Unicode

A character coding system to support the worldwide exchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world

 

union list

a unified listing of materials held distinctly or in common by a group of libraries.  Materials represented often reflect a given subject area of mutual interest to the participating institutions and to others beyond that group.

 

USB

Universal Serial Bus is a serial bus standard to interface devices, such as flash or external drives.

 

UTF-8

8-bit UCS/Unicode Transformation Format) that is backwards compatible with ASCII.  The encoding standard is capable of displaying in email and in Internet browsers the standard 128 ASCII characters for English as well as Latin alphabet characters with diacritics, Greek, Cyrillic, Coptic, Armenian, Hebrew, and Arabic characters.

 

VERUS

The OCR software specialized for converting Arabic text produced by NovoDynamics Inc., headquartered in Ann Arbor, Michigan,

 

Web services

“a software system designed to support interoperable machine-to-machine interaction over a network”, e.g. over the Internet systems using open standards can communicate much like different software systems on a computer can interact.

 

Workflow

the movement of documents and / or tasks through a process to accomplish a goal, e.g. digitization workflow involves scanning, processing, and OCR conversion; repository workflow can include ingest, indexing, searching, retrieval and presentation.

 

XML

Extensible Markup Language is a specification for indicating customized mark-up languages.  Thus, MARCXML, created by the Library of Congress, is a web-based standard for the original LC schema.