Skip to Main Content

Researcher-Led Web Data Archiving: Wayback Machine API

Resources and instructions for archiving web data, and accessing existing archives of web data.

Disclaimer

This is a work-in-progress guide developed in response to research concerns about at-risk data sets hosted online, with significant contributions by Yale Library's Born Digital Archives Advisory Group.

If you have further questions, please feel free to reach out or schedule a consultation.

Wayback Machine API

The Wayback Machine offers a few different APIs. These can support a variety of tasks—from quickly checking whether captures exist for a long list of links to more involved querying and filtering.

Checking a List of Links

If you're looking to check the availability of more than one dataset, resource, or link, it may be fastest to check using the Wayback Machine API.

The basic API query format is https://archive.org/wayback/available?url=, to which you can append the URL you're interested in. The response will be in a JSON format, and unless you specify otherwise it will return the closest available snapshot to the current date.

For example, the query https://archive.org/wayback/available?url=yale.edu might return:

{
  "url": "yale.edu",
  "archived_snapshots": {
    "closest": {
      "status": "200",
      "available": true,
      "url": "http://web.archive.org/web/20250705000814/https://www.yale.edu/",
      "timestamp": "20250705000814"
    }
  }
}

If there's no snapshot, "archived_snapshots" will be empty.

The easiest way to make more than one query like this is to write a few lines of code; in Python, you can import the requests module and the json module, while in R you may want to use the httr package and jsonlite (or a comparable JSON package).

If you want a snapshot around a particular time (rather than the most recent), you can also add a timestamp to the end of the query (such as &timestamp=20120123 for the snapshot closest to January 23, 2012). This might look like https://archive.org/wayback/available?url=yale.edu&timestamp=20120123, and could return:

{
  "url": "yale.edu",
  "archived_snapshots": {
    "closest": {
      "status": "200",
      "available": true,
      "url": "http://web.archive.org/web/20120125004036/http://yale.edu:80/",
      "timestamp": "20120125004036"
    }
  },
  "timestamp": "20120123"
}

Contact Us

If you have additional questions, feel free to email or book a consultation.