Boost logo

Boost :

From: Andrzej Krzemienski (akrzemi1_at_[hidden])
Date: 2024-01-23 18:45:30


pon., 22 sty 2024 o 21:58 Ruben Perez via Boost <boost_at_[hidden]>
napisał(a):

> Hi all,
>
> This is my review of the proposed Boost.Charconv.
>
> DISCLAIMER: both me and the author are affiliated with the C++
> Alliance. I am writing this review as a Boost author who would like to
> use this library in the code he maintains.
>
> > - What is your evaluation of the potential usefulness of the library?
>
> I’m reviewing this library because I’d like to use it in Boost.MySQL.
> In short, I think any library that implements a text-based protocol
> can benefit from Boost.Charconv.
>
> I’m implementing functionality to safely generate client-side queries
> for Boost.MySQL. Given a user-supplied query format string and some
> runtime, untrusted values, I need to generate a string containing a
> SQL query. User-supplied values may be any MySQL built-in types,
> including integers and floats. While prepared statements already cover
> part of this, there are cases where they are not practical (e.g.
> queries with dynamic filters).
>
> This functionality is security-critical. Formatting a query
> incorrectly can easily lead to SQL injection vulnerabilities, which
> are extremely dangerous.
>
> Composing queries with ints and floats requires converting them to
> string in a predictable way. Until now, I’ve considered std::snprintf,
> boost::lexical_cast and std::to_chars for this task. Both
> std::snprintf and boost::lexical_cast are locale-dependent, which can
> lead to nasty results. When formatting the double 4.2, we need to
> output the string “4.2”. Under some locales, these functions output
> “4,2”, which is interpreted as a field list by MySQL (potential
> exploit!). std::to_chars behaves like I want, but isn’t implemented by
> most of the compilers Boost.MySQL supports (we require C++11).
>
> A similar problem occurs when parsing results. When not using prepared
> statements, MySQL sends values as strings, so ints and doubles must be
> parsed. From what I’ve observed, libraries like Boost.Json also face a
> similar problem, often resorting to implementing their own conversion
> functions.
>
> Outside the protocol world, the library can be useful for tasks like
> URL parameter parsing, as it’s common to embed numeric identifiers in
> URLs.
>
> To sum up, I think this library will be useful to other Boost
> libraries, and probably to others, too.
>
> > - What is your evaluation of the design?
>
> The library is a drop-in replacement for std::to_chars/from_chars, so
> there is not a lot to say. I think the standard design works well, and
> this library follows it.
>
> The only point where Boost.Charconv deviates from the standard is when
> parsing floating point values that are too big or too small to be
> represented in their type. Charconv modifies the output to +- 0 or
> +-HUGE_VAL, where the standard says it shouldn’t. I understand this
> helps because it provides more information to the caller, and it may
> actually be changed in the standard. The original Boost.Charconv
> included a macro to configure this behavior, which I think is
> problematic being a compiled library. From
> https://github.com/cppalliance/charconv/pull/120, I can see this has
> been changed. An extra function, from_chars_strict, is included. I
> think this is the right design.
>
> The biggest barrier I have for adoption is that it’s a compiled
> library. MySQL is header-only, which means that the official CMake
> modules in the release don’t contain a Boost::mysql target. Adding a
> hard dependency on Boost.Charconv is thus a breaking change for my
> users, who will need to update their CMLs.
>

Would it be too hard to provide a header-only way of consuming this library?
Libraries like Boost.Test allow that, but I do not know how that squares
with ODR.

Regards,
&rzej;

>
> I think that being a compiled library is the right thing, though. This
> is actually a limitation in our current tooling - there should be a
> Boost::mysql CMake target. From the conversations I’ve had with the b2
> maintainer in the cpplang slack, this will be changing with the new
> modular Boost structure, so it shouldn’t be a concern. I will defer
> adoption until these changes are in place, though.
>
> > - What is your evaluation of the implementation?
>
> I’ve only had a quick glance. It surprised me finding an instance of
> malloc/strtold, since these functions violate the “non-allocating,
> locale-independent” guarantees. I understand this is for an extreme
> edge case, and that the “locale-independent” guarantee is still
> provided through internal logic. I don’t know enough about these
> parsing algorithms to determine whether this is the right decision or
> not. How much effort would require rolling a hand-rolled strtold
> alternative for such edge cases? In any case, this can be done once
> the library is in Boost.
>
> The library seems to not be running any fuzzing at the minute, which I
> think is quite necessary given the library’s nature.
>
> > - What is your evaluation of the documentation?
>
> I find it really difficult to navigate. Fortunately, it closely mimics
> the standard, so I end up just looking at cppreference. But still, my
> experience hasn’t been good. These are the main pain points I see:
>
> - It's a single page. Even for a small library, it makes navigation
> very cumbersome.
> - It’s not linked. If you want to find the reference for a struct or
> function, your only option is to Ctrl+F.
> - The overview for each function is written like a reference. But then
> you have another reference at the bottom.
> - The sidebar TOC has too many levels.
>
> As much as I want to get rid of quickbook, my feeling is that
> quickbook-based docs currently generate more usable output than
> asciidoc-based ones. It may be a matter of tuning templates and
> developing additional tools, but we’re not there yet.
>
> Please find some additional minor comments at the end of the review.
>
> - Did you try to use the library? With what compiler? Did you have any
> problems?
>
> I’ve integrated the library into Boost.MySQL, in a feature branch
> (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve
> fully implemented query formatting and updated the parsing functions,
> all using the proposed library. I’ve had no surprises. All CIs pass
> all tests, which perform considerable coverage on the parsing side.
> This includes different versions of the major compilers and operating
> systems, with different configurations - see
> https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star
> and
> https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workflows/build-code.yml
> for details.
>
> I’ve found a problem when linking to the library using the CMake
> targets, where a missing flag causes linker errors - see
> https://github.com/cppalliance/charconv/issues/130.
>
> > - How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?
>
> I’ve dedicated around 15 hours of work to integrate the library in
> Boost.MySQL and clean up CIs.
>
> > - Are you knowledgeable about the problem domain?
>
> I’m just a user. I understand the domain problem and the different
> alternatives. I don’t have any knowledge about the algorithms’
> internal workings.
>
> > - Your decision.
>
> I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the
> condition that proper fuzzing is added to the test suite. I’ve worked
> with libfuzzer before and can assist the author if required.
>
> Thanks Matt for submitting the library, and Christopher for being
> review manager.
>
> Regards,
> Ruben.
>
> Minor comments on documentation:
>
> - The first example uses to_chars(... sizeof(buffer) - 1), which seems
> to imply a NULL-terminator somewhere. Since there is none, I think it
> should just be sizeof(buffer).
> - Docs don't specify the behavior about NULL-terminators. Either
> specify "adheres to standard" or clarify that no NULL-terminator is
> written.
> - IMO the front page should clearly state that this library is
> compliant with the charconv standard header, noting any exceptions.
> - from_chars overview is missing semicolons in the friends section.
> - Please use monospace in the discussion to refer to identifiers, like
> [first, last).
> - Basic usage: from_chars name is qualified but from_chars_result is not.
> - Code snippets wrap, which makes them difficult to read.
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk