Copyright and IP Concerns
The unresolved legal questions about training data and AI-generated content
What it is
LLMs are trained on vast amounts of copyrighted content without explicit licensing from creators, books, articles, code, art. The legal question of whether this constitutes fair use or copyright infringement is actively litigated (e.g., The New York Times v. OpenAI, Getty Images v. Stability AI).
On the output side: the US Copyright Office has stated that purely AI-generated works lack the human authorship required for copyright protection, though works with significant human creative input may qualify.
Models can also reproduce training data verbatim (memorization), generated code might reproduce GPL-licensed code, and the legal status of AI-assisted creative work is unsettled.