
Scraping the Surface: OpenAI Sued for Data Scraping in Canada
Overview
Canadian courts are set to make another ruling on the legality of using artificial intelligence (“AI”) technology to scrape data from websites. Data scraping is the practice of automatically extracting data from online sources using software. While Canadian courts have previously determined that scraping data without permission is not permissible, the rise of AI and its growing accessibility have led to continued commercial use of AI technology to illegally obtain data and train AI systems.
On November 29, 2024, a precedent setting claim was brought forward by several Canadian news companies (the “Plaintiffs”) in the Ontario Superior Court of Justice against OpenAI, Inc. and its related companies, including OpenAI GP, LLC, OpenAI, LLC, OpenAI Startup Fund I, LP, OpenAI Startup Find GP I, LLC, OpenAI Startup Fund Management, LLC, OpenAI Global, LLC, OpenAI OpCo, LLC, OAI Corporation and OpenAI Holdings, LLC that work to develop, commercialize and fund OpenAI’s AI products (collectively, the “Defendants”) for allegedly data scraping copyrighted content. The Plaintiffs represent Canada’s leading news outlets that are responsible for publishing journalistic content and media across various platforms, including The Toronto Star, The Vancouver Province, The Calgary Sun, The Calgary Herald, The Daily Herald, The Edmonton Journal, The Edmonton Sun, The London Free Press, The National Post, The Ottawa Citizen, The Ottawa Sun, The Daily Observer, The Daily Press, The Winnipeg Sun, The Globe and Mail, The Canadian Press, and CBC.
The Plaintiffs are each well-known players in the Canadian media landscape and argued that the works that each Plaintiff has produced are highly valuable and a product of significant creative efforts and monetary investment. These works are widely distributed across Canada, including on websites, mobile apps and through print media. Together the Plaintiffs host millions of works across various platforms, both owned and licensed by the Plaintiffs.
The Plaintiffs alleged that the Defendants have used their intellectual property without proper authorization as a means of building a commercially successful business that has generated enormous profits through the sale of AI-powered products and services. The legal basis of the Plaintiffs’ claim is rooted in copyright infringement and breach of contract, specifically alleging that the Defendants’ use of the Plaintiffs’ works violates Canadian copyright law and amounts to breach of the Plaintiffs’ applicable terms and conditions governing the use of each respective work.
In the claim, the Plaintiffs claimed that the Defendants are liable for the following: (a) the alleged unauthorized use of the Plaintiffs’ copyrighted works by the Defendants in violation of section 3[1] and 27[2] of the Copyright Act; (b) the alleged circumvention of protection measures by the Defendants used by the Plaintiffs to prevent unauthorized copying and access of its works, specifically in violation of section 41 and 41.1 of the Copyright Act;[3] (c) the Defendants’ breach of the Plaintiffs’ online terms and conditions governing its respective websites; and (d) the unjust enrichment received by the Defendants for the misappropriation of the Plaintiffs’ intellectual property.
The Plaintiffs have deployed a myriad of technical measures to restrict access to their copyrighted works on their websites, including the robot exclusion protocol used to prevent automated scraping of data. Despite this, the Plaintiffs allege that the Defendants have subverted these technical protection measures to gain access to their works and exploit them for commercial purposes.
Additionally, each of the Plaintiffs endeavoured to control how users could interact with and use their works by means of various legal terms and conditions. When accessing the Plaintiffs’ works online, users must accept the applicable terms and conditions, which specify that the use of the works are for personal, non-commercial use only and specifically prohibit the reproduction or distribution of the work without express authorization of the Plaintiffs. By allegedly using the Plaintiffs’ works for profit through the commercialization of products like ChatGPT Plus and ChatGPT Enterprise, the Plaintiffs asserted that the Defendants have breached the Plaintiffs’ applicable terms and conditions.
The Plaintiffs further contended that the Defendants have been, and continue to be, unjustly enriched by using the works of the Plaintiffs without their knowledge, consent or appropriate license. The Defendants have generated billions of dollars in annual revenue through the sale of its products and services. As of October 2024, the Defendants have been valued at a staggering $157 billion. The Plaintiffs alleged that they have been deprived of significant potential revenue generated by their works.
The Plaintiffs sought substantial compensation from the Defendants. The order for compensation requested by the Plaintiffs includes a portion of the profits earned by the Defendants from the alleged infringement of the Plaintiffs’ copyright works and circumventing protections, statutory damages set at CA$20,000 per work, damages for unjust enrichment and, further, punitive damages for the Defendants’ willful misconduct. In addition to the damages sought, the Plaintiffs additionally requested both pre-judgment and post-judgment interest, along with the costs of the legal proceedings.
The Defendants have released public statements asserting that it is fair or in the public interest to use publicly available information based on the principle of fair use to train and improve its AI systems. The “fair use” of public content remains a highly debated practice in the Canadian technology sector.
In a joint statement released by a subset of the Plaintiffs, including Torstar, Postmedia, The Globe and Mail, the Canadian Press and CBC, the news media companies indicated that while they welcomed technological innovation, the act of data scraping of journalistic content for commercial gain is illegal and not in the public’s best interest. The Plaintiffs maintained that this case is about upholding Canadian journalism and protecting the substantial investments made by organizations across the country to produce fact-checked, sourced and reliable, trusted news and information by, for and about Canadians. The rapid spread of unverified content has eroded public trust, making it essential for credible outlets to uphold rigorous standards of fact-checking, transparency and accountability. In an era where anyone can publish content, with or without assistance from an AI system, the role of professional journalists in verifying facts and maintaining ethical standards have never been more vital.
This is not the first instance of a claim being brought forward through the Canadian legal system addressing the legality of data scraping. In 2019, the Federal Court of Canada ruled on the legality of data scraping in The Toronto Real Estate Board v. Mongohouse.com, where it found that web scraping activities of the defendant were unlawful and upheld the plaintiff’s copyright in website content.
On November 4, 2024, the Canadian Legal Information Institute (“CanLII”) filed a Notice of Claim with the Supreme Court of British Columbia against 1345750 B.C. Ltd., Clearway Management Ltd., Alistair Vigier doing business as Caseway AI Legal, Caseway AI Legal Limited and John Doe Corporation. The claim alleges that the defendants violated CanLII’s terms of use, which prohibited bulk downloading and scraping of the CanLII website without express permission or a license. CanLII is also seeking an injunction against Caseway AI Legal to prohibit the use of any material obtained from its website without authorization.
The allegations contained in the claim brought forward by the Plaintiffs have not been proven in Court, and the Defendants have not yet filed their defence to the allegations made. It is fair to say that the claim brought forward by these Canadian news companies against OpenAI, Inc. among others, has generated considerable public interest in Canada and we await further guidance from the Ontario Superior Court of Justice regarding the legality of mass data scraping by AI systems.
For more information about the legality of data scraping in Canada, please contact the authors, Lisa R. Lifshitz, Partner and Chair, or Laura Crimi, Associate in the Technology and Privacy & Data Management Groups at Torkin Manes LLP.