Use Policy of the Web Lab
This is a preliminary version of the policy for use during the pilot release of the Web Lab. During this phase the Web Lab is available to authenticated Cornell faculty, students, and staff only, and access is provided only to the Web Lab structure database, not to the underlying Web pages.
Background
This policy governs the use of the Cornell Web Lab for research and education. All users are expected to read the policy and agree to abide by it. The objectives behind the policy as follows.
Curatorial responsibility. Data in the lab comes from the Web Collection of the Internet Archive. The Internet Archive is a not-for-profit, 501(c)(3) organization in San Francisco, California, which offers permanent access for researchers, historians, and scholars to historical collections that exist in digital format. The Web Collection consists of snapshots of the Web, collected by crawling the open access Web at approximately two-month intervals since 1996. The curatorial goal of the Web Lab is to ensure that the data that is stored and made available accurately reflects the data that has been archived by the Internet Archive.
Computing resources. The Web Lab has limited computer power available. The collection is so large that apparently simple operations can be major computational tasks. For some large experiments, an individual researcher may schedule dedicated use of the entire system. Other users will be authorized to use the system, whenever available, but with restrictions on the resources that they can use.
Copyright. Most of the material in the Web Lab is subject to copyright. Web crawlers assume that the act of placing information on the Web includes an implied license to use it for limited purposes, including archiving and academic research, unless the owner of a Web site indicates otherwise. The Internet Archive and the Web Lab respect robots.txt exclusions and all other requests from copyright owners. Users of the Web Lab are likewise required to observe such requests.
Privacy. Research that mines the data in a manner that might identify individuals is subject to the rules that govern research involving human subjects. Adherence to these rules is an explicit requirement of the National Science Foundation. At Cornell, such research is subject to the standards and controls that apply to all academic research involving human subjects. External researchers will need to demonstrate that they have similar controls in place.
Use of the Web Lab by Cornell faculty, staff, and students
General. All Cornell University faculty, staff, or student may use the Web Lab for any academic purpose including research and teaching, but not for any commercial purpose. This use is covered by Cornell's usual computing policies, which are available at http://www.cit.cornell.edu/policy/responsible-use/.
Computing resources. Almost all users of the Web Lab will access it via the web interface, as described in http://www.weblab.infosci.cornell.edu. This page has links to the various collections that are available for researchers. Before using a collection it is necessary to register. Researchers who wish to develop and run programs on the Web Lab system itself will need to request authentication by the Cornell Theory Center.
Copyright. Most of the content in the Web Lab is subject to copyright. It should be used only for academic research and teaching. Publications based on the research should follow the usual practices of fair use in quoting extracts from Web pages. If requested by the Internet Archive or by the copyright owner, the Web Lab will remove material from the collection.
Privacy. Any study of the content in the Web Lab that includes mining of data about individuals falls within the guidelines for Use of Human Subjects in Research. Cornell's procedures for such research are available at http://www.osp.cornell.edu/Compliance/UCHS/homepageUCHS.htm.
