On Tue, Jun 23, 2026, at 8:03 AM, Rainer Deyke via Boost wrote:
Using checksums for authentication, or indeed any authentication system whatsoever, is quite problematic for serialization. Data sanitation is
Do you mean input sanitation? Input sanitation really isn't applicable here. There's no dangerous things to avoid in a generic way. You **could** have safety limits (like maximum number of container elements, max registered types, max memory allocated, stuff like that. Bounds checking is already in place anyways, correct me if I'm wrong).
not an authentication problem, and no authentication may be possible.
Imagine this scenario: - User A creates a file and uploads it onto the internet, without any knowledge of who is going to consume the file. - User B downloads the file and loads it in program C, trusting the security of program C to remain secure in the presence of untrusted data. - There are no shared secrets between A and B. There are no public keys. These users do not know or trust each other at all.
The idea here is: don't load the file if you cannot trust the source. The corollary is that if you are devising a format for this kind of untrusted exchange, you have to build in your own protection on top of any underlying archive format, whether Boost or not. Note that the same already goes for simple tar archives. Here too, consuming applications can use limits (e.g. path traversal limits, expanded size limits?)
Somehow this scenario works for image files.
It's easier for a special/single purpose fixed format.
It works for HTML files. I'll argue it clearly doesn't, unless you **only** parse to /dev/null
It even more or less works for Javascript files, Yeah, no. It clearly works more less than more here. JSON might be your "more or less" scenario?
But it does not work at all for Boost.Serialization archives, and it cannot be made to work for Boost.Serialization archives.
I'm in favor of building in some basic restrictions along the way, which certainly reduce the harm of corrupted/malicious archives. Best case it avoid comprosing the consuming code, while rejecting the input. It *might* involve disabling support for say polymorphic types under "strict" deserialiation settings(?), but other than that, nothing too seriously hampering (akin to "body length limits" on HTTP message parsing).
That's a pretty serious restriction for Boost.Serialization. The kind that should go in big bold red text on the main page of the Boost.Serialization documentation. It's not a bug per se, but it means that Boost.Serialization can only be used for a small subset of serialization scenarios, and the user could easily miss this restriction if they don't read the entire documentation of the library.
I'm torn here. The Boost community has tended to assume users know what they are doing. They tend to err on the side of giving the user all the power to optimally tune things for their use-case (I don't think Flyweight has big red notices about DoS attacks e.g.). Very similar concerns apply to multiple other libraries: Spirit Parsers (and Boost Parser too, IIRC) *by* default have no limits on variable-length input constructs at all. It can be easy to very involved to build a secure parser in them. Boost Interprocess managed segments have similar UB caveats when mixing versions/architecture or, worse, mapping pages from untrusted sources. I'd flip this around. I did ever get an implied promise of consistency, integrity checks or error detection from ANY part of the Boost Serialization docs. Quite the contrary. There's many spots that have caveats around risking UB if used/versioned improperly. Seth