Jonathan Band represents Internet companies and library associations with respect to intellectual property matters in Washington, D.C. He does not represent any party in connection to the Google Print Project. A version of this article first appeared in E-Commerce Law & Policy (August 2005).
On September 20, 2005, the Authors Guild and several individual authors filed a complaint in federal district court in New York alleging that Google is engaging in “massive copyright infringement” through the Google Print Library Project. This culminated months of publisher condemnation of the initiative, which involves scanning the collections of five major research libraries and making the full text of the books searchable on Google. Despite the allegations of infringement, libraries, users, and some authors have welcomed the Project, insisting that it will actually stimulate demand for books by helping readers identify books that contain the information they seek. These varying perceptions of the Print Library Project stem in part from confusion over exactly how much text will be viewable in response to a search query. Publishers and authors should carefully study precisely what Google intends to do and understand the relevant copyright issues before supporting the Authors Guild’s lawsuit.
The Google Print Project
The Google Print project has two facets: Print Publisher Program and the Print Library Project. Under the Publisher Program, a publisher controlling the rights in a book can authorize Google to scan the full text of the book into Google‘s search database. In response to a user query, the user receives bibliographic information concerning the book as well as a link to relevant text. By clicking on the link, the user can see the full page containing the search term, as well as a few pages before and after that page. Links would enable the user to purchase the book from booksellers or the publisher directly, or visit the publisher’s website. Additionally, the publisher would share in contextual advertising revenue if the publisher has agreed for ads to be shown on their book pages. Publishers can remove their books from the Publisher Program at any time. The Print Publisher Program raises no copyright issues because it is conducted pursuant to an agreement between Google and the copyright holder.
Under the Print Library Project, Google plans to scan into its search database materials from the libraries of Harvard, Stanford, and Oxford Universities, the University of Michigan, and the New York Public Library. In response to search queries, users will be able to browse the full text of public domain materials, but only a few sentences of text around the search term in books still covered by copyright. This is a critical fact that bears repeating: for books still under copyright, users will be able to see only a few sentences on either side of the search term. Users will not see a few pages, as under the Publisher Program, nor the full text, as for public domain works. Indeed, a full page of the book is never seen for an in-copyright book scanned as part of the Library Project unless a publisher decides to transfer its book into its Publisher Program account, in which case it would be under the agreement between Google and the publisher. 1
Google’s August 11th Announcement
The Association of American Publishers reacted negatively to the Print Library Project. In response to the AAP’s concerns, Google announced on August 11, 2005, that if a publisher provided it with a list of its titles that it did not want Google to scan at libraries, Google would respect that request, even if the books were in the collection of one of the participating libraries. To allow publishers to determine whether they wanted to exclude any of their titles from the Library Project, Google stated that it would not scan any more copyrighted works until November.
Patricia Schroeder, AAP President, stated that “Google’s announcement does nothing to relieve the publishing industry’s concerns.” 2 She claimed the Google’s opt-out procedure “shifts the responsibility for preventing infringement to the copyright owner rather than the user, turning every principle of copyright law on its ear.” The AAP expressed continued “grave misgivings about … the Project’s unauthorized copying and distribution of copyright-protected works.” The Authors Guild went a step further, and sued Google for copyright infringement on September 20, 2005.
Analysis of the Author’s Guild’s Copyright Claims
The Print Library Project involves two actions that raise copyright questions. First, Google copies the full text of books into its search database. Second, in response to user queries, Google presents users with a few sentences from the stored text. Because the amount of expression presented to the user is de minimus, this second action probably would not lead to liability. Perhaps for this reason, the Authors Guild lawsuit focuses on the first issue, Google’s copying of the full text of books into its search database. The critical question in the litigation is whether this copying is excused by the U.S. Copyright Act’s fair use privilege.
The leading decision that considered the fair use issues relating to search engine operations is Kelly v. Arriba Soft, 336 F.3d 811 (9th Cir. 2003). Arriba Soft operated a search engine for Internet images. Arriba compiled a database of images by copying pictures from websites, without the express authorization of the website operators. Arriba reduced the full size images into thumbnails, which it stored in its database. In response to a user query, the Arriba search engine displayed responsive thumbnails. If a user clicked on one of the thumbnails, she was linked to the full size image on the original website from which the image had been copied. Kelly, a photographer, discovered that some of the photographs from his website were in the Arriba search database, and he sued for copyright infringement. The lower court found that Arriba’s reproduction of the photographs was a fair use, and the Ninth Circuit affirmed.
With respect to the first factor, “the purpose and character of the use, including whether such use is of a commercial nature,” 17 U.S.C. § 107(1), the Ninth Circuit acknowledged that Arriba operated its site for commercial purposes. However, Arriba’s use of Kelly’s images
was more incidental and less exploitative in nature than more traditional types of commercial use. Arriba was neither using Kelly’s images to directly promote its web site nor trying to profit by selling Kelly’s images. Instead, Kelly’s images were among thousands of images in Arriba’s search engine database. Because the use of Kelly’s images was not highly exploitative, the commercial nature of the use weighs only slightly against a finding of fair use.
Kelly at 818.
The court then considered the transformative nature of the use – whether Arriba’s use merely superseded the object of the originals or instead added a further purpose or different character. The court concluded that “the thumbnails were much smaller, lower resolution images that served an entirely different function than Kelly’s original images.” Id. While Kelly’s “images are artistic works intended to inform and engage the viewer in an aesthetic experience,” Arriba’s search engine “functions as a tool to help index and improve access to images on the internet ….” Id. Further, users were unlikely to enlarge the thumbnails to use them for aesthetic purposes because they were of lower resolution and thus could not be enlarged without significant loss of clarity. In distinguishing other judicial decisions, the Ninth Circuit stressed that “[t]his case involves more than merely a transmission of Kelly’s images in a different medium. Arriba’s use of the images serves a different function than Kelly’s use – improving access to information on the internet versus artistic expression.” Id. at 819. The court closed its discussion of the first fair use factor by concluding that Arriba’s “use of Kelly’s images promotes the goals of the Copyright Act and the fair use exception” because the thumbnails “do not supplant the need for the originals” and they “benefit the public by enhancing information gathering techniques on the internet.” Id. at 820.
Everything the Ninth Circuit stated with respect to Arriba applies with equal force to the Print Library Project. Although Google operates the program for commercial purposes, it is not attempting to profit from the sale of a copy of any of the books scanned into its database, and thus its use is not highly exploitative. The Google search index functions as a tool that makes “the full text of all the world’s books searchable by everyone.” 3 Neither the full text copies in the index, nor the few sentences displayed to users in response to queries, will supplant the original books. Rather, they will bring the books to the user’s attention.
With respect to the second fair use factor, the nature of the copyrighted work, the Ninth Circuit observed that “[w]orks that are creative in nature are closer to the core of intended copyright protection than are more fact-based works.” Kelly at 820. Moreover, “[p]ublished works are more likely to qualify as fair use because the first appearance of the artist’s expression has already occurred.” Id. Kelly’s works were creative, but published. Accordingly, the Ninth Circuit concluded that the second factor weighed only slightly in favor of Kelly. The Print Library Project involves only published works. And while some of these works will be creative, the vast majority will be non-fiction.
The third fair use factor is “the amount and substantiality of the portion used in relation to the copyrighted work as a whole.” 17 U.S.C. § 107(3). The Ninth Circuit recognized that “copying an entire work militates against a finding of fair use.” Kelly at 820. Nonetheless, the court states that “the extent of permissible copying varies with the purpose and character of the use.” Id. Thus, “if the secondary user only copies as much as is necessary for his or her intended use, then this factor will not weigh against him or her.” Id. at 820-21. In Kelly, this factor weighed in favor of neither party:
although Arriba did copy each of Kelly’s images as a whole, it was reasonable to do so in light of Arriba’s use of the images. It was necessary for Arriba to copy the entire image to allow users to recognize the image and decide whether to pursue more information about the image or the originating web site. If Arriba copied only part of the image, it would be more difficult to identify it, thereby reducing the usefulness and effectiveness of the visual search engine.
Kelly at 821.
In the Print Library Project, Google’s copying of entire books into its database is reasonable for the purpose of the effective operation of the search engine; searches of partial text necessarily would lead to incomplete results. Moreover, unlike Arriba, Google will not provide users with a copy of the entire work, but only with a few sentences surrounding the search term. And if a particular term appears many times in the book, the search engine will allow the user to view only three instances – thereby preventing the user from accessing too much of the book. Thus, at least with respect to the search results, the third factor weighs in favor of Google.
The Ninth Circuit decided that the fourth factor, “the effect of the use upon the potential market for or value of the copyrighted work,” 17 U.S.C. §107(4), weighed in favor of Arriba. The court found that the Arriba “search engine would guide users to Kelly’s web site rather than away from it.” Kelly at 821. Additionally, the thumbnail images would not harm Kelly’s ability to sell or license full size images because the low resolution of the thumbnails effectively prevented their enlargement.
Without question, the Print Library Project will increase the demand for some books. The project will expose users to books containing desired information, which will lead some users to purchase the books or seek them out in libraries (which in turn may purchase more copies of books in high demand). It is hard to imagine how the Library Project could actually harm the market for certain books, given the limited amount of text a user will be able to view. To be sure, if a user could view (and print out) many pages of a book, it is conceivable that the user would rely upon the search engine rather than purchase the book. Similarly, under those circumstances, libraries might direct users to the search engine rather than purchase expensive reference materials. But when the user can access only a few sentences before and after the search term, any displacement of sales is unlikely.
The Authors Guild might argue that the Library Project restricts rightsholders’ ability to license their works to search engine providers. The existence of the Print Publisher Program, which involves licensing, demonstrates that the Library Project does not preclude lucrative licensing arrangements. By participating in Print Publisher Program, publishers receive revenue streams not available to them under the Library Project. And Google presumably prefers for publishers to participate in the Publisher Program; Google saves the cost of digitizing the content if publishers provide Google with the books in digital format.
In sum, under the Ninth Circuit’s analysis in Kelly, Google’s Print Library Project satisfies the requirements of the fair use doctrine. But the Authors Guild brought its action in the Second Circuit rather than the Ninth Circuit. This, however, should have little impact on the fair use analysis because the Ninth Circuit relied heavily on the Supreme Court’s most recent fair use decision, Campbell v. Acuff-Rose, Music, Inc., 510 U.S. 569 (1994). Thus, Kelly correctly noted that Campbell held that “[t]he more transformative the new work, the less important the other factors, including commercialism, become.” Kelly at 818, citing Campbell at 579. Likewise, Kelly cited Campbell for the proposition that “the extent of permissible copying varies with the purpose and character of the use.” Kelly at 820, citing Campbell at 586-87. And Kelly followed Campbell’s conclusion that “[a] transformative work is less likely to have an adverse impact on the market for the original than a work that merely supersedes the copyrighted work.” Kelly at 821, citing Campbell at 591.
Perhaps most importantly, Kelly repeated the Supreme Court’s articulation in Campbell and Stewart v. Abend, 495 U.S. 207, 236 (1990), of the objective of the fair use doctrine: “This exception ‘permits courts to avoid rigid application of the copyright statute when, on occasion, it would stifle the very creativity which that law is designed to foster.'” Kelly at 817. The Print Library Project is completely consistent with this objective in that it will ensure that creative accomplishments do not fade into obscurity. Because the Ninth Circuit so closely followed Campbell, and because the Second Circuit is also obligated to follow Campbell, the Second Circuit is likely to conduct a fair use analysis similar to the Ninth Circuit’s.
The Big Picture
Stepping back from the technicalities of the four fair use factors, it becomes clear that the Print Library Project is similar to the everyday activities of Internet search engines. A search engine firm sends out software “spiders” that crawl publicly accessible websites and copy vast quantities of data into the search engine’s database. As a practical matter, each of the major search engine companies copies a large (and increasing) percentage of the entire World Wide Web every few weeks to keep the database current and comprehensive. When a user issues a query, the search engine searches the websites stored in its database for relevant information. The response provided to the user typically contains links both to the original site as well as to the “cache” copy of the website stored in the search engine’s database.
Significantly, the search engines conduct this vast amount of copying without the express permission of the website authors. Rather, the search engine firms believe that the fair use doctrine permits their activities. In other words, the billions of dollars of market capital represented by the search engine companies are based primarily on the fair use doctrine.
In addition to fair use, search engine firms rely on the concept of implied license. Search engine firms assume that if information is posted on a website, the website operator wanted the information to be found by users, and search engines are the most efficient means for users to find the information. Thus, search engine firms assume that most website operators want their sites copied into the search engine database so that users will be able to find the site. If an operator does not want his site crawled and copied, he can use an exclusion header, a software “Do Not Enter” sign, which most search engine firms respect. But if a website operator does not use an exclusion header, a search engine will assume that the operator wants the site included in the search database.
This implied license theory has not yet been tested in court, and could actually constitute an element of a fair use defense. Courts have described fair use as an “equitable rule of reason,” Stewart v. Abend, 495 U.S. 207, 237 (1990), and industry practice is considered relevant in assessing the reasonableness of a defendant’s conduct. Accordingly, a court is likely to excuse as fair use a search engine’s copying of a website that did not use an exclusion header, provided that the search engine could show that it typically respected exclusion headers when website operators did employ them.
In the Print Library Project, Google is relying on fair use just as it and its search engine competitors rely on fair use when they copy millions of websites every week. Additionally, by giving publishers the opportunity to opt-out of the Print Library Project, Google is replicating the exclusion header feature of the Internet. Most authors want their books to be found and read. Moreover, authors are aware that an ever increasing percentage of students and businesses conduct research primarily, if not exclusively, online. Hence, if books cannot be searched online, many users will never locate them. The Print Library Project is predicated upon the assumption the authors generally want their books to be included in the search database so that readers can find them. But if a copyright owner does not want Google to scan her book, Google will honor her request.
Contrary to the AAP’s assertion, this opt-out feature does not turn “every principle of copyright law on its ear.” Rather, it is a reasonable implementation of a program based on fair use.
Fair use under the U.S. Copyright Act is generally broader and more flexible than the copyright exceptions in other countries. Thus, the scanning of a library of books might not be permitted under the copyright laws of most other countries. However, copyright law is territorial; that is, one infringes the copyright laws of a particular country only with respect to acts of infringement that occurred in that country. Since Google presumably will be scanning the in-copyright books in the United States, the only relevant law with respect to the scanning is U.S. copyright law. 4
Nonetheless, the search results will be viewable in other countries. This means that Google’s distribution of a few sentences from a book to a user in another country must be analyzed under that country’s copyright laws. (Google arguably is causing a copy of the sentences to be made in the random access memory of the user’s computer.) While the copyright laws of most countries might not be so generous as to allow the reproduction of an entire book, almost all copyright laws do permit short quotations. These exceptions for quotations should be sufficient to protect Google’s transmission of Library Project search results to users.
The Google Print Library Project will make it easier than ever before for users to locate the wealth of information buried in books. By limiting the search results to a few sentences before and after the search term, the program will not diminish demand for books. To the contrary, it often will often increase demand for copyrighted works by helping users identify them. Publishers and authors should embrace the Print Library Project rather than reject it.
1 Displays of the different treatments can be found at http://print.google.com/googleprint/library.html. Google has also agreed to provide each library participating in the Program with a digital copy of all the works in that library’s collection scanned by Google. The libraries typically will keep the files of the in-copyright works as a dark archive for preservation purposes. See University of Michigan/Google Digitalization Partnership FAQ, August 2005. The fair use analysis of these preservation copies is different from that of the copies in Google’s search index, but the result is the same: both are fair uses.
2 Association of American Publishers Press Release, Google Library Project Raises Serious Questions for Publishers and Authors, August 12, 2005.
3 Official Google Blog post, Making Books Easier to Find, August 11, 2005.
4 Google reportedly will only scan public domain works at the Oxford University libraries.