On 6/23/26 14:27, Seth via Boost wrote:
On Tue, Jun 23, 2026, at 8:03 AM, Rainer Deyke via Boost wrote:
Using checksums for authentication, or indeed any authentication system whatsoever, is quite problematic for serialization. Data sanitation is
Do you mean input sanitation? Input sanitation really isn't applicable here. There's no dangerous things to avoid in a generic way. You **could** have safety limits (like maximum number of container elements, max registered types, max memory allocated, stuff like that. Bounds checking is already in place anyways, correct me if I'm wrong).
Input sanitation is always applicable when there is input involved. Defining a bunch of maximums isn't really useful. Application-level maximums can be checked after getting the containers from Boost.Serialization, along with all other application invariants. Worst that can happen, assuming no errors in Boost.Serialization, is that Boost.Serialization invokes a failsafe by throwing std::bad_alloc or calling std::terminate. Not a problem for desktop applications.
The idea here is: don't load the file if you cannot trust the source. The corollary is that if you are devising a format for this kind of untrusted exchange, you have to build in your own protection on top of any underlying archive format, whether Boost or not.
"Never trust user input" is one of the cardinal rules of software development. It may not apply if the "user" is a programmer passing obvious nonsense directly into a function, but it always applies if the input comes from a file or a network connection, no matter how "trusted" the computer on the other side of the network. Any data that enters a process by any means, even from a different process that's part of the same application, is untrusted. (I do make an exception for data that originates within the process.) There is no protection that a user can provide on top of Boost.Serialization that prevents Boost.Serialization from invoking undefined behavior. The user can't validate the data /before/ passing it to Boost.Serialization without reimplementing a better version of Boost.Serialization, and the user can't validate the data after it comes out of Boost.Serialization because at that point the process is already compromised.
I'm torn here. The Boost community has tended to assume users know what they are doing. They tend to err on the side of giving the user all the power to optimally tune things for their use-case (I don't think Flyweight has big red notices about DoS attacks e.g.).
Very similar concerns apply to multiple other libraries: Spirit Parsers (and Boost Parser too, IIRC) *by* default have no limits on variable-length input constructs at all. It can be easy to very involved to build a secure parser in them. Boost Interprocess managed segments have similar UB caveats when mixing versions/architecture or, worse, mapping pages from untrusted sources. Being able to overwhelm a process by sending it into an infinite loop or by consuming a lot of memory isn't very interesting to me. Almost all programs are vulnerable to locking up and crashing. That's usually not a security issue. When it is, you're usually either in an embedded system where user input is very limited or you're on an internet server where DoS attack and defense ultimately comes down to a war of attrition.
Boost.Interprocess does look very dangerous. Obviously so. I don't need a big red warning to know to stay away from it unless I am really sure I can tolerate the risk of using it. I don't think that applies to Boost.Serialization, because writing safe serialization code isn't that hard. -- Rainer Deyke - rainerd@eldwood.com