NOTE: I recently had the privilege of editing and writing several chapters in a textbook entitled “Intro to AI and Ethics in Higher Education”. The following is a reproduction and adaptation of my chapter “Implications of Copyright Law on the Use of GenAI Tools in Education and Workplace”, as well as a discussion of a three-way balance that will be the topic of discussion in a panel at the 2024 AECT International Convention.
Introduction
Before we begin, let me clarify that this chapter is meant for an audience in the United States. Privacy, copyright, and data laws vary throughout the world. Although the principles of United States copyright are similar to those in the United Kingdom, they are not the same. Therefore, please apply these guidelines and examples to applicable laws in your political jurisdiction, wherever that may be.
Jason, an undergraduate student at the College of Southern Utah, has been inspired by the many creators who are using generative AI to enhance their thoughts and productivity. He knows that his ideas are on par with these creators, and so he sets out to create an illustrated book for children, with illustrations from Ideogram and words from ChatGPT! Before long, Jason has learned how to submit a digital book for purchase and has published his first book. He is looking forward to further expanding his creations through AI! In order to protect his income, and because he does not want people to misuse his work, he submits a copyright registration application to the Copyright Office of the United States.
He is dismayed, therefore, to receive a letter from the U.S. Copyright Office stating that they will not accept his copyright claim for his materials! They state that since Jason used artificial intelligence to create his words and text, he cannot claim to be the author or creator of either product. They state that copyright claims must come from a human author, and that a person who prompts a machine to create something does not currently qualify as an author.
Jason is very frustrated with this turn of events. He put in the effort to create those materials, even if they may have been more removed from his own hands. He formulated the prompts and edited the material that the tools created! Therefore, he submits a copyright claim for another children’s book. This time, he does not say anything about the artificial intelligence used to create his materials. He introduces imperfections and does more enduring edits to make it seem as human-made as possible. This must work, because he gets a letter granting him copyright for the book.
Soon, he has nine or ten books that he has published using generative AI tools, and all he had to do was not tell the U.S. Copyright Office that he used AI! One day, though, he gets a letter from the Copyright Office. Somehow, they found out that he used AI to create his materials. While they are not suing him or fining him, which they state that they could, they are removing all copyrights he has to his library of published works.
Can the Copyright Office really do this? Do they truly believe that any AI involvement can remove all creators’ rights claimed by the tool user? The answer to both of these questions is “Yes.”
Two Conflicting Copyright Decisions
A similar situation actually occurred in 2022. Kris Kashtanova, a renowned digital artist, released a comic book entitled Zarya of the Dawn. When she registered for copyright in the United States, she neglected to state that she had used Midjourney to create all of the images in the comic book, with some editing done by her after she selected the images. Whether or not this was a deliberate decision or an oversight due to lack of precedent, the Copyright Office did not care. They immediately notified her that Zarya of the Dawn would not be protected by copyright law in the United States. This has been one of the most public occurrences related to AI products and requests for their protection under copyright.
On the other hand, though, there are instances in which authors who use generative AI can receive limited rights of copyright. This was evidenced in another decision by the United States Copyright Office that took place throughout the Spring 2024 season. While it initially declined to recognize the copyright of Ellen Rae, an author who used generative AI to rewrite a novel, she appealed for a reconsideration. In response, the United States Copyright Office reversed its decision to a point. It still refused to give her copyright over the text. It stated that ChatGPT created the text and it could not copyright any works. However, it did grant Ellen copyright over the arrangement, structure, and formatting of the text within the book. While it should be noted that Ellen Rae claimed that generative AI was a tool she used to compensate for mental hardships, the Copyright Office did not explicitly link that claim to the decision it eventually made. This could open up new possibilities for creativity.
There are two more intersections of copyright and generative AI that students, educators, and creators should acknowledge.
Many creators are claiming that institutions and individuals that create generative AI tools are infringing upon their copyright protections when they use the creators’ works to train the tools.
Creators of the tools, and some users, claim that using copyrighted materials to train generative AI models is acceptable under the fair use doctrine, which will be explained in more detail below.
The third intersection of copyright law and generative AI is related to the contract between programmers/creators of generative AI tools and those who use those tools. Can the programmers of AI tools have the right to claim copyright of the materials? Do the prompters alone have that right? Or do the programmers and the prompters share copyright of the works created by the tool? Some tool programmers act as though they do have copyright over the works created by their tools. For example, the OpenAI Terms of Use imply that OpenAI has copyright over all works created by any of their generative AI tools. In their Terms of Use, they completely hand over “their” rights to the tool user, as though they have the authority to do this. Essentially, they act as though they are the party that needs to give permission. However, according to the United States Copyright Office, neither party in this contract has the authority to give any copyright. Since these are derivative works of copyrighted works, OpenAI has no legal right to grant any copyright transfer. The only two ways this can happen is by using public domain and open access materials exclusively or by successfully arguing a fair use defense to the Supreme Court.
OpenAI used CommonCrawl and other open access materials to train their initial models, but they used copyrighted resources such as New York Times articles to refine their product. They are relying on a fair use argument to be legally protected.
These three quandaries will be my focus in this chapter. Can creators copyright AI-generated works? Can programmers and developers train their models on copyrighted works (and if not, what alternatives exist)? Can programmers claim copyright on the works generated by the tools they have trained and fine-tuned?
Understanding the Basics
Before we go into the murky issues regarding copyright law, we have to understand the nature of copyright, including its purpose in the laws of the United States, what it can protect, and what it cannot protect. This includes entities that are not protectable and those items that have passed into the public domain.
Copyright Fundamentals: Core principles of copyright law
The first American copyright law was created in the federal Constitution. This document explicitly mentioned that creators of works would be compensated for uses or purchases of that work. This idea was formally made into its own law in 1790, in a law that described copyright as a way to ensure the spread of information to as many people as possible. In fact, the stated purpose of the Copyright Law of 1790 was to ensure that schools could obtain written works and other intellectual property so they could teach their students.
The Copyright Law of 1976, which was the first copyright law to explicitly mention the fair use doctrine, is also the copyright law that frames all of our current legal discourse regarding copyright.
The concept of copyright is that the creator, custodian, or other owner of this right (or the collected rights under the umbrella of copyright) holds the title to the intellectual property of a particular work. Intellectual property is a creation that was the result of the work of the mind of one or more people. Intellectual properties express ideas through literary, artistic, oral, and other media, including:
Literary works
Musical works
Dramatic works
Pictorial, graphic, and sculptural works,
Motion pictures
Audiovisual works
Sound recordings
Architectural works
Compilations and derivative works
The ownership of this property means that the copyright owner has exclusive rights to control duplications, alterations, performance, display, and dissemination of a particular work, expression, manifestation, and item. Librarians and archivists both work to ensure the widest possible access to all types of work (copyrighted and non-copyrighted) while recognizing that access may be justifiably limited in certain instances.
While it may seem that copyright protects virtually everything in our world, there are some products that are not protected under copyright:
Ideas
Processes
Devices
Blank books, forms, charts, calendars, etc.
Laws and judicial opinions
Titles of works
Facts and data
Recipes
Works that have not been created by humans (including ChatGPT, for the time being)
Works of federal (and some state) government employees
Public domain materials
The last category is of special importance to those who train and fine-tune generative AI models. For copyrighted works, developers have to either obtain licenses or trust that a fair use argument will hold up in court (more on that later). However, items created by state and federal governments are always in the public domain and are free to use in any model training. Furthermore, existing copyright law states that in general , all works published by private individual or corporations are released into the public domain 95 years after they are published. All of these works are also available to train models. Finally, creators can choose to automatically release their works into the public domain or provide open licenses, such as Creative Commons licenses, that will allow free re-use under certain conditions.
The Three Questions (not Tolstoy’s)
When any person considers ethically using generative AI tools for commercial use, there are three main questions that they must ask themselves. These questions are also advisable for those who are using generative AI products for non-commercial purposes:
What are the Rights and Responsibilities of the Copyright Owner?
What are the Rights and Responsibilities of the Copyright User?
Which is Generative AI, the Owner or the User? Or, Is It Both? Or, Neither?
Balance between Consumer and Creator Rights
Although many free and open materials can be used to train models and fine-tune tools, many users and creators would still like to train tools on copyrighted materials. In addition to claiming that using copyrighted works is fair use, they also state that there should be a balance between consumer, programmer, and creator rights. They claim that using these works to train models is not derivative (or infringing), because the text and ideas in the work are used to create an entirely new work. In other words, the use of these copyrighted works is “non-consumptive.” No monetary value is lost in using the copyrighted works. They are not taking the ideas, characters, plot points, or proprietary information from the original work. They are taking syntax, context, and sentence structure, and the other aspects may be reassembled by enterprising users. However, the intent is not to reproduce the material upon which the tool was trained. The intent is to use the data to influence new forms of creativity.
This claim brings up another central issue regarding AI and copyright. According to existing US Copyright Office guidelines, only human creators are allowed to register for copyright protection. Any AI involvement at all (except for minor editors such as Grammarly or Photoshop) disqualifies a product from being protected. To solve this issue, we must answer the question: Is AI a Creator? Can humans who use AI tools to create products count themselves as creators since they were the ones who prompted the tool to create?
The Core Issues
The three questions above and the court case examples illustrate that there are three main actors in all discussions about generative AI and copyright ownership:
the creator of original works
the programmers who use generative AI tools to generate content while potentially using others’ original content, and
the end users of generative AI tools who consume that generated content, potentially after introducing other copyrighted works as part of their own prompts.
AI and Authorship
The court cases mentioned above illustrate the two ends of the political and ideological spectrum regarding generative AI use in “creating” generated works. Some view any AI involvement at all as proof that a work is not worthy of being protected by copyright. Others give the author credit for arranging the outputs and initiating the ideas that led to the generation of the outputs, and other elements of formulating the finished product. Still others are somewhere in the middle. The writing of each of the authors in this book should be enough to demonstrate where they stand on this issue.
While AI use as a tool is debated, it is essentially universally acknowledged that an AI tool is not a human. It is not sentient. Therefore, it can not in and of itself qualify for copyright protection of its works. In fact, this was one of the main arguments of the Copyright Office’s Zarya decision. However, when it decided differently in the case of Ellen Rae, it implied that using AI as a tool was allowable practice to an extent. Evidently, it operated under the distinction that while Kashtanova simply used images without significantly changing them, Rae shifted and manipulated the outputs to create something that was clearly different than the original generations. Still, these decisions have not been codified into a law or guideline. The question remains: Does AI use by a human automatically negate their work from being protected under traditional copyright or at least a partial copyright such as an open license?
Whether or not AI-generated works can be granted copyright under any conditions is in some measure unrelated to the perception of academia regarding AI and authorship. Simply because something is copyrightable does not mean that it is considered part of best practice. Still, many of the arguments for and against copyright are similar to arguments for and against the use of generative AI in academic environments. The other members of Idaho OPAL and I will address these questions in our later chapter.
Programmers’ Rights
If an AI user can claim copyright on the arrangement and selection of outputs from generative AI tools, can the programmers or developers claim copyright on the generated output? According to the Terms of Use of OpenAI, the creators of ChatGPT, and Ideogram AI, the creators of the open-access image transformer Ideogram, the company can claim copyright over the output. This is evident in the fact that they explicitly give that copyright to their users. If they did not think that they deserved the copyright, then they would not feel that they could give that copyright away. Their Services (the apps and the machinations through which they produce generated output) are proprietary and cannot be disassembled or sold, but the Output is completely transferred to the copyright of the user.
No matter how much these institutions want to act as though they own the copyright, no federal or state agency has ruled or decided on the reality of these claims (or, really, inferences). It could be that since the AI tools are meant to be part of the creative process rather than the creator of the final product, the courts or the Copyright Office could eliminate all developers’ claims to copyright.
Authors’ Rights
The original authors of works that have been used to train AI models claim that their copyright has been infringed upon and that the institutions who gather copyrighted works must be punished. They usually support their arguments by stating that the AI-generated works are “derivative” of their own works. This essentially means that the resulting image, text, or video retains enough identifiable aspects of the first work to suggest a relation between the two works. It also means that the creator of the secondary works owes compensation in some form to the creator of the original work.
Court cases revolving around claims by the original works’ creators have had mixed results, with courts mostly deciding in favor of the generative AI creators. Two recent court cases against Stability AI and their tool, Stable Diffusion, Illustrate this point. One of the lawsuits against them was brought by the media corporation Getty Images. The other was initiated by a small group of individual art creators. In the first case, the court dismissed all arguments against Stability AI. In the second case, the court dismissed all but one of the arguments against Stable Diffusion. The only argument allowed was that Stability had violated the copyright of one artist of the group of three. With fair use arguments claiming that model training is an allowable defense for alleged copyright infringement, it could be that Stable Diffusion is not held in violation of copyright after all.
Ethical and Legal Considerations in Academia
Now that we have talked about the relationship between original creators, the developers of the AI tool, and the consumers of the AI-generated products, let’s progress to the implications of these agents and relationships in academia. How does the fact that AI tools can use copyrighted works to develop generative output affect their perception and use in academia? What types of tools are perceived more favorably in academia than others? How does artificial intelligence use impact the perception of a student in academia? How do academic integrity and intellectual property policies impact the use of AI in academia?
Navigating the Evolving Landscape
Educators and students are both responsible to stay informed about the changing legalities and ethical considerations regarding generative artificial intelligence. Do not rely on others, particularly news sources, to tell you what you can or cannot do or should or should not think about these issues. Read Justia.com to discover court cases related to this topic, look at C-SPAN recordings of federal government proceedings, and read releases from the Copyright Office to understand and interpret their decisions.
Kashtanova’s and Rae’s copyright decisions by the United States Copyright Office are only the beginning events in this chain, and there are many ways that generative artificial intelligence could be seen as a new type of creator, a positive creative tool, or an interloper in the creation process. We should all act according to best practices while remaining cognizant of the legal restrictions of federal and state governments.
Intellectual Property Rights
Central to the discussion of copyright, especially as it relates to generative AI tools, is “intellectual property.” What exactly does “intellectual property” mean? This term refers to the original creative products of one or more individuals. These products are referred to as “works,” and they contain intellectual ideas, efforts, and concepts. The creators of these works have similar rights as owners of tangible property. They can control anything done with this intellectual property. This is why book authors sell licenses to creators of derivations, such as audiobooks, translations, film adaptations, sequels, abridgments, and other works that are derived from their original work.
How does this affect generative artificial intelligence tools and their outputs? Well, arguments against generative artificial intelligence argue that these tools violate intellectual property rights. In other words, opponents of generative AI use claim that in consuming copyrighted works for training purposes (or fine-tuning, in the case of works uploaded for specific prompts), generative AI tools always make unlicensed derivative works. They state that creators of these tools and those who use them should pay for licenses for derivative works.
There are open-source generative AI tools, such as LLaMa for text and Ideogram for images, but these tools only use public domain images or licensed images that have been granted for open-source use. These tools are not in danger of violating any copyright laws.
But what if a tool is not open-source and does use copyrighted works? OpenAI, along with the ACRL and ALA, claims that use of copyrighted works in training data does not violate copyright law because it constitutes fair use. The “fair use doctrine” states that an individual or group may use a copyrighted work to create transformative works if their use fulfills two or three of four factors:
the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
the nature of the original copyrighted work;
the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
the effect of the use upon the potential market for or value of the copyrighted work.
Supporters of using copyrighted works in training data claim that using these works is fair use according to 1 and 4 because the tool does not retain or transmit the main points of the copyrighted work unless explicitly asked. Even if it is asked to do so, it only transmits these in summary as a commentary on the original work. It will not use the central elements of the copyrighted work. Instead, it learns about syntax and communication from these works and stores the metadata about the information and data in the works. This type of use is what supporters call “non-consumptive.” Other supporters refer to a related concept called “non-expressive use,” which is an argument frequently used by creators of search engines, databases, and other systems and products that use context and metadata to provide resources to users.
While the “fair use” argument certainly seems compelling, it is important to note that a fair use argument is an argument, not an allowance. In other words, the validity of a fair use claim is determined by the court, not by the plaintiff or the defendant. Therefore, if you are considering using a generative AI tool, be sure that you can document and justify any fair use argument you wish to make.
Related to the “fair use” argument is the fact that one of the most public lawsuits against OpenAI, who created ChatGPT, was supported by malicious and adversarial prompts to “force” the AI tool to reproduce copyrighted content verbatim.
“It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”
This is clear misuse of the tool, as stated in the Terms and Conditions of OpenAI. Then, the forced reproductions were seen as proof that ChatGPT explicitly violated copyright without guardrails.
A similar concept to “intellectual property” rights is that of “moral rights.” This idea originated in France during the French Revolution as one of the natural rights that all individuals possess. As time passed, the idea that a person’s artwork was inextricably connected to their sense of self spread to other countries in Europe. In 1923, the Berne Convention officially recognized moral rights as a legal factor in the general European community. These rights influence the expectations that original creators should be attributed when their works are used. Also, it is expected that the use of works will be respectful, so as to not besmirch the intent, honor, or integrity of the original creator.
Currently, moral rights are only codified in the United States as they relate to visual works. Textual and audio works are not associated with moral rights in legal discourse. However, this could change as the nature of creation evolves through generative artificial intelligence use. Best practices in the United States recommend respecting the moral rights of all creators, except for the creation of satirical or parody works.
Practical Implications in Higher Education
How should we act now that we know this background information about ethical issues? Let’s look at some example scenarios.
Example 1: Dario, the Digital Media Student
Background:
Dario, a digital media student at a mid-sized college, utilizes AI tools for creating visual content as part of his coursework. He employs generative AI software to produce images for a project that aims to illustrate the progression of digital art over the decades.
Challenge:
Dario faces a copyright issue when one of the images generated by the AI closely resembles a well-known copyrighted photograph from the 1980s, leading to an ethical dilemma regarding copyright. Although Dario is reasonably sure that the photographer will never know about the copyright infringement, he still wants to do the right thing. The concern arises over the AI's training data and whether it included copyrighted materials without proper licensing.
Best Practices and Argument for AI Use:
Dario had read up on best practices for AI use in digital media from resources like the American University's Center for Media & Social Impact and the Creative Commons website. He ensures that the AI tools he uses are from reputable providers who transparently disclose their data sourcing and training methods. Dario argues that his use of AI is crucial for educational purposes, enabling students to learn and experiment with new forms of media creation. He stresses that the AI-generated image, while reminiscent of past styles, is inherently a new creation, showcasing AI’s ability to learn from existing art to generate novel works. This use supports educational advancement and promotes innovation within the constraints of fair use, as outlined in the copyright guidelines provided by his institution's digital media department.
Example 2: Emina, the Computer Science Graduate Student
Background:
Emina, a graduate student in computer science, is working on her thesis, which involves the development of an AI model that summarizes academic papers. She uses existing research papers as training data for her model. She has released it to the public as part of what she sees as her professional responsibility to share advancements with the general public.
Challenge:
The challenge arises when a publisher claims copyright infringement, asserting that Emina’s AI model illegally uses copyrighted texts to train its algorithms.
Best Practices and Argument for AI Use:
Emina has thoroughly documented her process, adhering to best practices in AI and machine learning outlined in resources like the IEEE's "Ethically Aligned Design" and the Association for Computing Machinery's Code of Ethics. Emina argues that her use of AI serves a critical educational purpose by contributing to academic research and knowledge dissemination. She highlights that her model’s training on copyrighted texts falls under the fair use exemption for educational purposes, as it transforms the original works for a scholarly analysis without undermining the market for the original texts. Emina points out that the AI-generated summaries provide significant educational benefits, facilitating quicker access to research insights and fostering broader academic engagement.
Best Practices
What can the examples above show us about best practices regarding generative AI Tool use in higher education institutions?
In both cases, the students make compelling arguments for the responsible use of AI in educational settings, emphasizing the importance of ethical considerations, transparency, and adherence to established best practices. These case studies highlight the nuanced balance between copyright law and the innovative use of AI technologies in higher education, advocating for policies that support both the protection of intellectual property and the advancement of educational tools.
If you search for best practices regarding generative AI use by students in higher education, you will see dozens of websites offering guidance. As stated above, formal “best practices” have not been established by any governing organization. In this case, the best thing that practitioners can do is to follow general guidelines regarding technology use or use of copyrighted works and apply those general recommendations to specific uses.
Here are some of the most common ideas in best practices lists, both unique recommendations regarding AI and specialized implementations of general recommendations by professional organizations:
Verify the information sources in generative AI outputs, including all citations and quotes.
Document which products, or portions of products, were created with AI
Communicate with faculty members about your use of AI and follow course, department, and institution policies.
Edit the output heavily after you receive it from a generative AI tool, especially if you are using text outputs.
Use AI as a discussion tool and collaborator rather than a provider of a “finished” product for submission.
Consult with librarians, instructional designers, and other professionals in your institution for information about policies, recommendations, well-intentioned practices, and other suggested courses of action.
Conclusion: The Future of AI and Copyright in Academia
If you extrapolate from the past arguments and examples in this chapter, you can see that there are three general categories of hypothetical futures for generative AI use in the context of copyright:
The argument by the Association of Research Libraries (ARL) and the American Library Association (ALA) that “training generative AI models on copyrighted works is fair use” is accepted by the federal government. Regulated integration of copyrighted works becomes the norm for generative AI tools of all types.
The Copyright Office continues to hear implementation and output use cases on an individual case-by-case basis, considering the authors’ statements and arguments and the differences between the AI output and the finished product.
The argument that any use of copyrighted works is not fair use and should be licensed or punished will be supported by federal government institutions. This will cause each AI institution to choose one of four routes: either rely on open access and public domain materials, become an underground industry reliant on plagiarism, develop a “salutary neglect”-like relationship with governments, or pay for licensing of all copyrighted materials used in training, which will drastically increase user costs.
In any of these future states, students and educators will have to proactively think of new ways to integrate generative AI tool collaboration with their workflow. In any of these three scenarios, generative AI tools will still exist. They will still be enhancing creativity and productivity. And you students will still be responsible for making the next decade’s decisions regarding technology and intellectual property. Choose wisely!
Reed, how did the copyright office know AI was being used? Did the author admit it or is there a reliable detector out there? I have tried QuillBot, but the accuracy is hit or miss.