Boost logo

Boost :

From: Christopher Kormanyos (e_float_at_[hidden])
Date: 2024-01-23 05:15:01


> This is my review of the proposed Boost.Charconv
Thank you Ruben for your detailed, clear,in-depth and prompt review of the proposedBoost.Charconv.

    On Monday, January 22, 2024 at 09:58:14 PM GMT+1, Ruben Perez via Boost <boost_at_[hidden]> wrote:
 
 Hi all,

This is my review of the proposed Boost.Charconv.

DISCLAIMER: both me and the author are affiliated with the C++
Alliance. I am writing this review as a Boost author who would like to
use this library in the code he maintains.

> - What is your evaluation of the potential usefulness of the library?

I’m reviewing this library because I’d like to use it in Boost.MySQL.
In short, I think any library that implements a text-based protocol
can benefit from Boost.Charconv.

I’m implementing functionality to safely generate client-side queries
for Boost.MySQL. Given a user-supplied query format string and some
runtime, untrusted values, I need to generate a string containing a
SQL query. User-supplied values may be any MySQL built-in types,
including integers and floats. While prepared statements already cover
part of this, there are cases where they are not practical (e.g.
queries with dynamic filters).

This functionality is security-critical. Formatting a query
incorrectly can easily lead to SQL injection vulnerabilities, which
are extremely dangerous.

Composing queries with ints and floats requires converting them to
string in a predictable way. Until now, I’ve considered std::snprintf,
boost::lexical_cast and std::to_chars for this task. Both
std::snprintf and boost::lexical_cast are locale-dependent, which can
lead to nasty results. When formatting the double 4.2, we need to
output the string “4.2”. Under some locales, these functions output
“4,2”, which is interpreted as a field list by MySQL (potential
exploit!). std::to_chars behaves like I want, but isn’t implemented by
most of the compilers Boost.MySQL supports (we require C++11).

A similar problem occurs when parsing results. When not using prepared
statements, MySQL sends values as strings, so ints and doubles must be
parsed. From what I’ve observed, libraries like Boost.Json also face a
similar problem, often resorting to implementing their own conversion
functions.

Outside the protocol world, the library can be useful for tasks like
URL parameter parsing, as it’s common to embed numeric identifiers in
URLs.

To sum up, I think this library will be useful to other Boost
libraries, and probably to others, too.

> - What is your evaluation of the design?

The library is a drop-in replacement for std::to_chars/from_chars, so
there is not a lot to say. I think the standard design works well, and
this library follows it.

The only point where Boost.Charconv deviates from the standard is when
parsing floating point values that are too big or too small to be
represented in their type. Charconv modifies the output to +- 0 or
+-HUGE_VAL, where the standard says it shouldn’t. I understand this
helps because it provides more information to the caller, and it may
actually be changed in the standard. The original Boost.Charconv
included a macro to configure this behavior, which I think is
problematic being a compiled library. From
https://github.com/cppalliance/charconv/pull/120, I can see this has
been changed. An extra function, from_chars_strict, is included. I
think this is the right design.

The biggest barrier I have for adoption is that it’s a compiled
library. MySQL is header-only, which means that the official CMake
modules in the release don’t contain a Boost::mysql target. Adding a
hard dependency on Boost.Charconv is thus a breaking change for my
users, who will need to update their CMLs.

I think that being a compiled library is the right thing, though. This
is actually a limitation in our current tooling - there should be a
Boost::mysql CMake target. From the conversations I’ve had with the b2
maintainer in the cpplang slack, this will be changing with the new
modular Boost structure, so it shouldn’t be a concern. I will defer
adoption until these changes are in place, though.

> - What is your evaluation of the implementation?

I’ve only had a quick glance. It surprised me finding an instance of
malloc/strtold, since these functions violate the “non-allocating,
locale-independent” guarantees. I understand this is for an extreme
edge case, and that the “locale-independent” guarantee is still
provided through internal logic. I don’t know enough about these
parsing algorithms to determine whether this is the right decision or
not. How much effort would require rolling a hand-rolled strtold
alternative for such edge cases? In any case, this can be done once
the library is in Boost.

The library seems to not be running any fuzzing at the minute, which I
think is quite necessary given the library’s nature.

> - What is your evaluation of the documentation?

I find it really difficult to navigate. Fortunately, it closely mimics
the standard, so I end up just looking at cppreference. But still, my
experience hasn’t been good. These are the main pain points I see:

- It's a single page. Even for a small library, it makes navigation
very cumbersome.
- It’s not linked. If you want to find the reference for a struct or
function, your only option is to Ctrl+F.
- The overview for each function is written like a reference. But then
you have another reference at the bottom.
- The sidebar TOC has too many levels.

As much as I want to get rid of quickbook, my feeling is that
quickbook-based docs currently generate more usable output than
asciidoc-based ones. It may be a matter of tuning templates and
developing additional tools, but we’re not there yet.

Please find some additional minor comments at the end of the review.

- Did you try to use the library? With what compiler? Did you have any problems?

I’ve integrated the library into Boost.MySQL, in a feature branch
(https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve
fully implemented query formatting and updated the parsing functions,
all using the proposed library. I’ve had no surprises. All CIs pass
all tests, which perform considerable coverage on the parsing side.
This includes different versions of the major compilers and operating
systems, with different configurations - see
https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star
and https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workflows/build-code.yml
for details.

I’ve found a problem when linking to the library using the CMake
targets, where a missing flag causes linker errors - see
https://github.com/cppalliance/charconv/issues/130.

> - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I’ve dedicated around 15 hours of work to integrate the library in
Boost.MySQL and clean up CIs.

> - Are you knowledgeable about the problem domain?

I’m just a user. I understand the domain problem and the different
alternatives. I don’t have any knowledge about the algorithms’
internal workings.

> - Your decision.

I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the
condition that proper fuzzing is added to the test suite. I’ve worked
with libfuzzer before and can assist the author if required.

Thanks Matt for submitting the library, and Christopher for being
review manager.

Regards,
Ruben.

Minor comments on documentation:

- The first example uses to_chars(... sizeof(buffer) - 1), which seems
to imply a NULL-terminator somewhere. Since there is none, I think it
should just be sizeof(buffer).
- Docs don't specify the behavior about NULL-terminators. Either
specify "adheres to standard" or clarify that no NULL-terminator is
written.
- IMO the front page should clearly state that this library is
compliant with the charconv standard header, noting any exceptions.
- from_chars overview is missing semicolons in the friends section.
- Please use monospace in the discussion to refer to identifiers, like
[first, last).
- Basic usage: from_chars name is qualified but from_chars_result is not.
- Code snippets wrap, which makes them difficult to read.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
  


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk