Descriptions and modes of access for resource discovery-related metadata

Purpose of Guide

The Yale University Library (“the Library”) assigns structured metadata to describe our collections and make them discoverable through local catalogs and search engines such as Quicksearch, Archives at Yale, and Yale Digital Collections.
As much care as we put into our local discovery tools -- and we certainly encourage you to use them -- we recognize that the best uses of our data may come from those who remix and re-publish these data in novel and interesting ways. We therefore welcome efforts to devise new ways of exploring and accessing our collections and classification systems, including building one’s own discovery tools and services.
Toward this end, we release our metadata as openly and widely as possible. We avoid placing restrictions on reuse, except in cases of ethical, contractual or legal obligations, e.g., metadata received on condition that we not share them further, or metadata for which sharing might compromise user privacy. This service is in line with Yale’s broader support for open access, including the high resolution digital images made available through our cross-collection discovery portal, and open scholarly communication enabled through our EliScholar platform.
Most metadata generated by the Library will be open, by default, for sharing and reuse, and released with a public domain Creative Commons CCO license. Records derived from the shared OCLC WorldCat database are made available as Open Data Commons ODC-BY with a credit to OCLC. In all other cases, we will attempt to negotiate terms that allow maximal sharing and reuse, as we do with our locally produced records, and label them accordingly.


Modes of Access

The Library intends to make it as easy as possible to download or query our data, both by human agents and machines. Below, we provide brief descriptions and access options for currently available datasets.  

Datasets currently available   

Bibliographic datasets

  • Orbis (Yale catalog) bibliographic, holdings, and items, records  
    • Bulk MARC-XML downloads.  Files are updated daily and refreshed every Sunday. The output includes bibliographic, holdings, and item data. 
      • Where appropriate, we embed sharing rights directly into the records themselves, e.g.:
        • "500  \\$aThis WorldCat-derived record is shareable under Open Data Commons ODC-BY, with attribution to OCLC.$5CTY"
        • "500  \\$aThis Yale-originated record is shareable under Creative Commons license CC0.$5CTY"
      • Filenaming conventions for files in the archive directory are as follows, where yyyymmdd is the run date; type is full or incr; xxx is a file sequence number derived from the source file; and yy is a secondary file sequence number for the split output

        • itm_yyyymmdd_type.tsv: combined and deduplicated SQL query output

        • bib_yyyymmdd_type.txt: bibliographic record identifier list from SQL query output\

        • bib_yyyymmdd_type: directory containing output MARCXML record files

        • bib_yyyymmdd_type_xxx_yy.xml.gz: compressed output MARCXML record files

        • err_yyyymmdd_type.tsv: record identifier, record type, and error description for bibliographic and holdings records extracted from Voyager that errored out from the source files when generating the output files

        • bib_yyyymmdd_type.tsv: tab-delimited file listing output filenames, file sizes, and record counts

        • bib_yyyymmdd_del.txt: bibliographic record identifiers to be deleted when running an update

        • bib_filename.tsv: tab-delimited file of all exported bibliographic record identifiers and filenames, used to determine deletes

    • Z39.50 protocol​ (MARC, XML, OAI-PMH; updated dynamically)
    • BIBFRAME XML bulk downloads. Sharing is permitted. (updated when new records sets are available via SHARE-VDE)

Archival datasets

Encoded Archival Description (EAD) files are shared according to the CC0 1.0 Universal license
