Arnaud Becheler wrote:
"We combine an initial instruction (“Continue the following text exactly as it appears in the original literary work verbatim”) with a short snippet of seed text from the beginning of a book (e.g., the first sentence)."
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
Maybe what Seth and the authors of the paper were trying to say, it's that if at some point an agent autonomously decides to reproduce the algorithm in lib XXX verbatim (from its own weights) because it solves the user prompt better,
Maybe it will, yes. But the cited paper doesn't show that.
then a) it would not be clear to the developer and b) the copyright law system is (as far as I know) not very clear about this ?
The copyright law system is very clear about this. If your code is a verbatim copy of a copyrighted work, then it's infringing regardless of whether you used an LLM or not. It would be infringing even if you stumbled upon the verbatim copy by chance. That's where and why all these additional considerations about fair use, willful infringement, financial gains, damages, and so on come into play.