The Yale University Library (“the Library”) assigns structured metadata to describe our collections and make them discoverable through local catalogs and search engines such as Quicksearch, Archives at Yale, and Yale Digital Collections.
As much care as we put into our local discovery tools -- and we certainly encourage you to use them -- we recognize that the best uses of our data may come from those who remix and re-publish these data in novel and interesting ways. We therefore welcome efforts to devise new ways of exploring and accessing our collections and classification systems, including building one’s own discovery tools and services.
Toward this end, we release our metadata as openly and widely as possible. We avoid placing restrictions on reuse, except in cases of ethical, contractual or legal obligations, e.g., metadata received on condition that we not share them further, or metadata for which sharing might compromise user privacy. This service is in line with Yale’s broader support for open access, including the high resolution digital images made available through our cross-collection discovery portal, and open scholarly communication enabled through our EliScholar platform.
Most metadata generated by the Library will be open, by default, for sharing and reuse, and released with a public domain Creative Commons CCO license. Records derived from the shared OCLC WorldCat database are made available as Open Data Commons ODC-BY with a credit to OCLC. In all other cases, we will attempt to negotiate terms that allow maximal sharing and reuse, as we do with our locally produced records, and label them accordingly.
The Library intends to make it as easy as possible to download or query our data, both by human agents and machines. Below, we provide brief descriptions and access options for currently available datasets.
Filenaming conventions for files in the archive directory are as follows, where yyyymmdd is the run date; type is full or incr; xxx is a file sequence number derived from the source file; and yy is a secondary file sequence number for the split output
itm_yyyymmdd_type.tsv: combined and deduplicated SQL query output
bib_yyyymmdd_type.txt: bibliographic record identifier list from SQL query output\
bib_yyyymmdd_type: directory containing output MARCXML record files
bib_yyyymmdd_type_xxx_yy.xml.gz: compressed output MARCXML record files
err_yyyymmdd_type.tsv: record identifier, record type, and error description for bibliographic and holdings records extracted from Voyager that errored out from the source files when generating the output files
bib_yyyymmdd_type.tsv: tab-delimited file listing output filenames, file sizes, and record counts
bib_yyyymmdd_del.txt: bibliographic record identifiers to be deleted when running an update
bib_filename.tsv: tab-delimited file of all exported bibliographic record identifiers and filenames, used to determine deletes
Encoded Archival Description (EAD) files are shared according to the CC0 1.0 Universal license