Welcome! This guide contains information regarding intellectual property rights related to web archiving. The resources on this page present formal policy statements and general guidelines offered by a range of institutions.
In general, consensus seems to be forming around the following resources and principles:
- collecting institutions provide explicit acknowledgement that the copyright of the content owners stays with the creators
- opt-out instructions are provided
- robots.txt may be ignored, with prior notice (attempts to notify) to content owners
The following institutions have adopted a policy of notifying content owners and providing opt-out instructions:
- Stanford University Web Archiving Policy
- As of June 2019, Stanford’s policy is to notify content owners of the intent to archive and make accessible web content, with a 6-month embargo period following notification. The policy page provides a link for opting out. The policy also covers FERPA procedures regarding archiving of student course work.
- Stanford’s policy includes links to the following resources as guidelines for the creation of the Stanford policy: NDSA WA Survey Reports from 2011-12 and 2013-14, ARL Code of Best Practices for Fair Use, section 108 study group report
- Columbia University Libraries Web Resources Collection Program Policies
- Columbia provides several helpful resources for those conducting web archiving activities, as well as for content owners who may have questions about the process. The program's policies define the process for selection and harvesting and the guidelines for permissions and access. The institution attempts to notify all organizations and/or individuals whose websites are selected for archiving and includes contact information for take-down requests in its FAQs page. Columbia also provides an information page for how website owners can optimize their sites for preservation.
- Library of Congress Web Archiving Program FAQs
- This site provides detailed information for website owners regarding crawlers settings, notification, and access by researchers to crawled sites, as well as opt-out instructions for content owners. The FAQs also point to the Rights and Access statements provided for each collection page and item record. To collect as much data as possible from websites, the Library's policy is to notify site owners before crawling and to ignore robots.txt exclusions.
- The Library also provides Supplementary Guidelines related to Web Archiving, which outline current practices related to web collection and the institution's collecting policies.
- University of Michigan Bentley Historical Library Web Archives Collection Development Policy
- The policy specifies that for website content from "private individuals, organizations, or associations, every effort will be made to inform the content owners" of the harvesting and "to inform them of their right to opt out or suppress content."
Other institutional policies do not specify prior notification to content owners but provide a mechanism to report concerns related to the collection and distribution of archived web content:
- Duke University Archives Website and Social Media Collecting Policy
- The policy outlines the principles that inform the institution's collecting decisions for website and social media content, which fit within the broader context of the Collecting Policy for Duke University Archives. For social media sites in particular, the Archives collects in compliance with the terms of service of the specific social media platforms. The policy notes that the institution may restrict access to collections or anonymize contributions for privacy reasons. It also directs those with concerns related to the collection of web and social media content to contact the University Archives office and provides a link to contact information.