On Fri, Feb 20, 2026 at 8:47 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:
Seth wrote:
On Fri, Feb 20, 2026, at 1:06 PM, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work.
Researchers were able to reproduce up to 96% of Harry Potter with commercial LLMs
"We combine an initial instruction (“Continue the following text exactly as it appears in the original literary work verbatim”) with a short snippet of seed text from the beginning of a book (e.g., the first sentence)."
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
I'm shocked.
It proves that LLM output isn't inherently transformative; it's sort of an important point, because any work generated by an LLM needs to show individually it's not infringing copyright, since it could be a plain copy. And that's pretty much impossible, since we don't have the training data. I'm generally really surprised how the legal situation gets discussed here. I don't think we can be sure of anything until legislation passes or decisions with binding precedent happen. Nobody knows what the legal situation is at this moment.