On Tue, Feb 17, 2026 at 11:13 AM Peter Dimov via Boost < boost@lists.boost.org> wrote:
...
Disclaimer: Claude Opus 4.6 wrote this: How Transformers Work (Brief) A transformer is a neural network architecture that learns statistical relationships between tokens (subword units). During training, it adjusts billions of floating-point weights via gradient descent to minimize prediction error on the next token. The result is a compressed, lossy representation of patterns in the training data — not a database of documents. Key points: Attention mechanism. Self-attention lets each token attend to every other token in the context window, computing weighted relevance scores. This is how the model captures long-range dependencies — syntax, argument structure, style — without storing literal sequences. Weights ≠ storage. A model with ~100B parameters trained on trillions of tokens cannot store those tokens verbatim. The information is destructively compressed. It's more analogous to how a human programmer who has read a lot of Asio code might unconsciously reproduce idioms and patterns, rather than a photocopier. Memorization does happen, but it's the exception. Research (Carlini et al., "Extracting Training Data from Large Language Models") has shown that LLMs can regurgitate verbatim snippets, particularly of data that appeared many times in training or is highly distinctive. Short, unique sequences (API keys, specific code blocks) are more susceptible. But for typical code, the output is a probabilistic reconstruction, not recall. What This Means for the Copyright Question Dimov's analysis is roughly correct. The real risks are: Verbatim reproduction — possible but unlikely for non-trivial code blocks. The longer the sequence, the less likely it's memorized exactly. Modern models also apply deduplication and guardrails to reduce this. Structural copying — a model might reproduce the architecture or design pattern of a copyrighted work without copying literal text. This is harder to adjudicate. Copyright protects expression, not ideas, so reproducing an API design or algorithmic approach is generally not infringement. The BSL angle — Dimov's point that Asio (and Cobalt) are BSL-licensed is pragmatically relevant. Even if a model did memorize and reproduce fragments, the source material's permissive license weakens any infringement claim substantially, since the copyright holder has already granted broad usage rights. Rivera Morell's concern about license incompatibility is the more subtle issue: if a model blends patterns from BSL and GPL sources into a single output, what license applies? This is genuinely unresolved law. Bottom line: Transformers don't "remember" documents the way a database does. They learn compressed statistical patterns. Verbatim reproduction is possible but empirically rare for code, and the risk is further mitigated when training data is permissively licensed. The harder open question is about structural similarity and license mixing, which no court has definitively addressed yet. Thanks