Rainer Deyke wrote:
Clearly the person who gives the prompt has no copyright claim. On the other hand, I can trivially write an "AI" program that regurgitates its exact training data, even if its internal model looks nothing like the training data. If I "train" this "AI" on the Star Wars movie, does this means I can use it to create a copyright-free copy of Star Wars?
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work. That's because compression works by identifying common patterns, and the common patterns aren't copyrightable. (Because if they were, all these one million films would all infringe on one another's copyright.) So, N=1, infringement, N=1,000,000,000 - not infringement. There's probably a cutoff point somewhere, and the training data sets are very likely above it.