Boost logo

Boost :

From: Rainer Deyke (rdeyke_at_[hidden])
Date: 2022-08-21 20:01:32


On 21.08.22 20:36, Peter Dimov via Boost wrote:
> Rainer Deyke wrote:
>> (Arbitrary percent-encoded 8-bit values are legal in URLs, but not in IRIs.)
>
> Aren't they? I didn't find anything prohibiting them in the RFC, although I
> might well have missed it.
>
> The sections that specify the recommended way to convert between URI
> and IRI do say how these are handled - a percent-encoded sequence in a
> URI that doesn't correspond to a valid UTF-8 encoded code point is left
> alone in the IRI. (Valid UTF-8 percent encodings are percent-decoded.)

OK, it is possible to partially decode a URL containing a mix of utf-8
and arbitrary 8-bit values to get something that looks like a URL, but
with Unicode. And this half-decoded URL is an IRI, so on a technical
level you are correct.

On the other hand, if calling the 'path' member function of a URL object
returns a fully percent-decoded path, then calling the utf-8 equivalent
of that member function should return something that is both legal utf-8
and fully percent-decoded. Which is only possible if the path contains
no percent-encoded values that are not utf-8.

-- 
Rainer Deyke (rainerd_at_[hidden])

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk