Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2021-10-13 01:48:44


On Tue, Oct 12, 2021 at 6:06 PM Vinnie Falco <vinnie.falco_at_[hidden]> wrote:
> What you are thinking of as a "valid URL parser input" is actually an
> Internationalized Resource Identifier, which supports the broader
> universal character set instead of just ASCII and is abbreviated by
> the even more obscure acronym "IRI." It is covered by rfc3987:
>
> <https://datatracker.ietf.org/doc/html/rfc3987>

We looked over this RFC and I think, it would be possible to support
IRIs simply by providing a new set of parsing functions, for example

    void parse_iri ( string_view, error_code&, url& );
    void parse_irelative_ref ( string_view, error_code&, url& );
    void parse_absolute_iri ( string_view, error_code&, url& );
    void parse_iri_reference ( string_view, error_code&, url& );

It wouldn't be possible to parse into a url_view, since UTF-8 encoded
characters have to be converted to percent-encoded escapes. But this
could be made to work, and it fits neatly into the current
implementation. There would be an additional function to take a url
and convert it back into its IRI string, which is mostly just decoding
percent-escaped characters. The library would disallow invalid UTF-8.

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk