Written by: Charles Dresser

The purpose of this paper is to argue that the use of other’s data to train OpenAI’s ChatGPT and DeepSeek’s large learning model (LLM) did not violate intellectual property (IP) law. [i]

  1. DeepSeek’s alleged use of ChatGPT outputs, for training its competing LLM did not violate OpenAI’s IP, because OpenAI has no IP rights to ChatGPT’s outputs.
  2. OpenAI’s wholesale copying of the Internet and, notably, the New York Times is not copyright infringement because it qualifies as transformative fair use, under current transformative fair use jurisprudence.

DeepSeek’S Distillation

1.1 Background

DeepSeek has been accused of using ChatGPT’s output to efficiently train its own LLM, a process known as artificial intelligence (AI) “distillation.” ChatGPT’s outputs (e.g., text) appear as though they were generated by a human but are, in fact, machine generated.

1.2 DeepSeek’s use of non-IP Protected Outputs

Assuming DeepSeek did use ChatGPT outputs to train its LLM, OpenAI owns no IP right that would have been violated by such use. ChatGPT outputs are of a type of information that would most likely be protected by trade secret or copyright law (as opposed to patent or trademark law). Trade secret law is intended to protect confidential business information from being used to unfairly benefit a competitor, and copyright law is intended to protect expressive (e.g., textual and graphic) from copying by others. Intuitively, policies supporting trade secret and copyright law seem consistent with legal protection against DeepSeek’s distillation of ChatGPT outputs to compete with ChatGPT. But, for the following reasons, ChatGPT outputs are neither trade secret nor copyright protected.

ChatGPT outputs are not protectable as trade secrets because they are not secret.

The definition of a trade secret is non-controversial the world around. For example, the Uniform Trade Secrets Act (USTA) (ratified in 49 states), the U.S. Defend Trade Secrets Act (DFTSA); and The Trade-Related Aspects of Intellectual Property Rights (TRIPS) agreement (with 164 participating countries) all require a trade secret to have the same three elements. A trade secret must include (1) secret information; (2) where the owner has made reasonable efforts to maintain secrecy of the secret information; and (3) where the secret information has independent value.

ChatGPT outputs are publicly provided to users and, therefore, neither secret nor subject to efforts to maintain secrecy, frustrating the first and second elements of a trade secret. As ChatGPT outputs can be used by competitors to train competing LLMs, ChatGPT outputs have independent value, satisfying the third element of a trade secret. Nevertheless, ChatGPT outputs are not protectable as trade secrets because they are not secrets.

ChatGPT outputs are not protected by copyright law because they are machine generated. Art. 1, Clause 8, Sec. 8 of the U.S. Constitution grants Congress the authority to grant exclusive rights to “Authors . . . to their . . . Writings.” Accordingly, the Constitution does not grant the federal government the power to grant a copyright on a machine generated output.

For example, the Copyright Office limited the copyright to Zarya Of The Dawn, a graphic novel created by Kristina Kashtanova, to exclude copyright protection for the novel’s imagery which was created by AI software Midjourney (with prompts by Kashtanova). U.S. Copyright Office, Re: Zarya of the Dawn (Registration # VAu001480196) (Feb. 21, 2023). Consistent with the constitutional requirement that copyrights only enure to human creators, the Copyright Office reasoned, that Kashtanova was not the creator of the imagery (and not entitled to a copyright in the imagery) because Midjourney “generat[ed] the images in an unpredictable way” that was “not controlled or guided” by Kashtanova. Id. In making its finding, the Office gave weight to the fact that Kashtanova did not start with an initial image of her own and, instead, Midjourney created the images from whole-cloth.

As ChatGPT’s outputs are machine generated (and not the work of a human author), ChatGPT’s outputs are not protected by U.S. Copyright Law.

ChatGPT’s outputs constitute significant investment, i.e. sweat of the brow, which has occasionally been protected from unfair competitor use, even without a copyright.

News, not the expression of news but factual representation of current events, is not protected by copyright law. Nevertheless, in International News Service v. The Associated Press (INS v. AP), the Supreme Court affirmed a ruling that enjoined[ii] International News Service from misappropriating or using Associated Press reporting in its own news articles until after the commercial value of the news has passed away. 39 U.S. 68 (1918). The court found jurisdiction in equity[iii] and recognized that news is not copyrightable but nevertheless requires significant sweat of the brow to discover and report. The court reasoned that the two competing companies ought to have a quasi property right against the unfair use of one’s news by the other.

“Regarding the news . . . , as . . . the material out of which both parties are seeking to make profits at the same time and in the same field, we hardly can fail to recognize that for this purpose, and as between them, [news] must be regarded as quasi property.”

Id. at 72.

One may argue that (a) the original reporting of the Associated Press (AP) is like the original training of ChatGPT; and (b) the copycat reporting by International News Service (INS) is like DeepSeek’s distillation of ChatGPT outputs. In both cases, (1) the two companies are competitors in the same market; (2) the original effort—AP reporting and OpenAI training—was substantial; and (3) the derived effort—copying news stories or training with copied ChatGPT outputs—was minimal. Thus, a court following the precedent of INS v. AP might be compelled to enjoin the use of DeepSeek’s LLM that was trained with ChatGPT outputs. However, OpenAI may not be able to find a court willing to follow the precedent of INS v. AP.

In over 100 years, INS v. AP has not been extended by the Supreme Court and whether it remains precedential is questioned today. In 1991, the Supreme Court did make clear that (1) the ruling in INS v. AP was not supported by copyright law; and (2) AP’s sweat of the brow journalistic efforts did not make the (otherwise uncopyrightable) news copyright protected. Feist Publ’ns, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340, 3rd footnote (1991).

DeepSeek’s use of ChatGPT likely breached OpenAI’s terms of service for use of ChatGPT.

OpenAI proscribes the use of ChatGPT outputs for the purpose of training a competing AI model in its terms of service.[iv] DeepSeek likely violated this contractual term and may be liable to OpenAI under contract law.

Accordingly, DeepSeek may have engaged in an unfair business practice and likely breached a terms of service contract. However, DeepSeek did not violate any Open AI Property rights because OpenAI has no property rights to ChatGPT outputs.

2 OpenAI’s Copying

2.1 Background

OpenAI is a defendant in a lawsuit filed by the New York Times and others. The New York Times alleges that OpenAI’s copying of the New York Times for training its LLM, ChatGPT, is copyright infringement. See 17 U.S.C. § 106(1) (granting the exclusive right to reproduce, or copy, copyrighted work to the copyright owner).

2.2 OpenAI’s Transformative Fair Use of the New York Times

OpenAI’s wholesale copying of the New York Times (the Times) for the purpose of training is transformative fair use.

Under 17 U.S.C. § 106(1), an author has the exclusive right to reproduce (copy) his copyrighted work. But otherwise infringing use of copyrighted material is considered non-infringing where it is “fair use.” 17 U.S.C. § 107. Fair use analysis requires consideration of four factors, including “(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copy-righted work.”17 U.S.C. §107. While the analysis considers each factor, courts consider some factors more important in some contexts than in others. Campbell v. Acuff-Rose Music.

(Fair Use Factor 1) The character of OpenAI’s use of the Times weighs in favor of fair use because it is transformative.

Under 17 U.S.C. §107(1), a transformative use is one which “adds something new, with a further purpose or different character.” Id. Here, OpenAI has used the Times to train an LLM, ChatGPT. ChatGPT uses the Times and other training data to generate text responses to user prompts. These text responses are well-informed, making ChatGPT useful for a myriad of functions. ChatGPT’s functionality exceeds providing news and entertainment, like the Times. Old newspapers have mundane uses, like paper mâché and bird cage linings. But using old newspapers to create a general purpose AI is manifestly transformative.

(Fair Use Factor 2) The nature of the New York Times weighs in favor of fair use.

The nature of a newspaper is factual reporting. As discussed above in reference to INS v. AP, the facts reported by a newspaper are considered uncopyrightable, while their expression is copyrightable (i.e., a “thin copyright”). The nature of the Times, thus, weighs in favor of fair use. A court would also likely consider the value OpenAI received by copying the Times and whether that value pertains to copyrightable expression.

In Google v. Oracle, the Supreme Court, held for fair use, in part, because of the thin copyright of the Java APIs, which Google copied to make Android. The Court reasoned that Google copied Java, not because of its copyrightable expression (i.e., “creativity, . . . beauty, or . . . purpose”), but because of its large programmer base. Id. Likewise, OpenAI used the Times, and other sources including Wikipedia and Google Patents, not for their copyrightable expression ,but for their uncopyrightable syntactic associations and corresponding factual concepts. Thus, like Java, OpenAI’s value in copying the Times is uncopyrightable by nature.

The nature of the Times weighs toward fair use because newspapers are entitled to a thin copyright only covering expression and OpenAI copied the Times for its non-expressive content.

(Fair Use Factor 3) Wholesale copying of the New York Times does not preclude fair use.

OpenAI copied all available New York Times for training ChatGPT. Such complete copying weighs against fair use. However this factor is regularly discounted in fair use analysis when the fair use is considered transformative.

For example, in Authors Guild v. Google, the Second Circuit Court of Appeals found transformative fair use, even though Google copied millions of books from countless libraries. The court reasoned Google’s copying was transformative because Google was making the copied text searchable. And in Campbell v. Acuff-Rose, the Supreme Court held copying of Roy Orbison’s song Pretty Woman to be transformative fair use, even though the most recognizable part, or “heart,” of the song was copied. The Court reasoned the copying was transformative because the resulting song was a parody that was critical of Orbison’s Pretty Woman.[v]

(Fair Use Factor 4) Weighing most heavily for fair use, OpenAI did not divert a potential market from the Times.

Market effects of the infringing use is the most important fair use factor. Harper & Row, Publishers, Inc. v. Nation Enterprises. Here, ChatGPT is not a competing newspaper. Likewise, ChatGPT has added an opt-out feature that allows websites to opt-out from being used to train ChatGPT. By opting out, the Times can ensure that ChatGPT users cannot, with careful prompting, use ChatGPT to get today’s Times articles from ChatGPT, avoiding the Times’s paywall. Furthermore, the Times has not argued that it considered artificial intelligence as a potential market.

OpenAI’s copying of the Times for training ChatGPT was fair use because: (1) the use was transformative; (2) the Times had value for its non-expressive—unprotected—content; (3) wholesale copying is permitted for transformative fair use; and, most importantly, (4) the use does not impact the Times potential market. Finally, OpenAI’s training of ChatGPT embodies the progression of science and the useful arts which, according to the U.S. Constitution, copyright law exists solely to promote.


Acknowledgement: this document was influenced by “AI and the Future of Law,” a class taught at the 2024 Intellectual Property Summer Institute (IPSI) at UNH Franklin Pierce School of Law.

[i] While this paper presents the opinion of its author, it is also intended to be persuasive. Opinions on these matters, especially ongoing cases, are likely to differ and change.

[ii] Injunctions are an equitable remedy in contrast to a legal remedy, like money.

[iii] U.S. Federal Courts (like the Supreme Court) are courts of law and equity. But courts of law and courts of equity originated as distinct courts, with courts of law awarding money, according to the Common Law, and courts of equity awarding actions be taken or abstained from, according to equitable principles. The first court of equity, the Court of Chancery of England and Wales, was ruled by the lord chancellor (political position) to make right unconscionable wrongs that were otherwise legal under the Common Law.

[iv] “You will not, and will not permit End Users to . . . (e) use Output (as defined below) to develop any artificial intelligence models that compete with our products and services. However, you can use Output to (i) develop artificial intelligence models primarily intended to categorize, classify, or organize data (e.g., embeddings or classifiers), as long as such models are not distributed or made commercially available to third parties and (ii) fine tune models provided as part of our Services”

[v] Familiar to many, the heart of Orbison’s Song Pretty Woman goes:

Pretty woman walkin’ down the street
Pretty woman, the kind I’d like to meet
Pretty woman, I don’t believe you, you’re not the truth
No one could look as good as you
Mercy

The copying work, Pretty Woman by 2 Live Crew, includes these lyrics and concomitant melody, but parodically criticizes the theme of Orbison’s famous song by attacking the plausibility of falling in love with a prostitute.


This publication is distributed with the understanding that the author, publisher, and distributor of this publication and/or any linked publication are not rendering legal, accounting, or other professional advice or opinions on specific facts or matters and, accordingly, assume no liability whatsoever in connection with its use. Pursuant to applicable rules of professional conduct, portions of this publication may constitute Attorney Advertising. The choice of a lawyer is an important decision and should not be based solely upon advertisements.