Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-27 07:45:03


On Thu, Jan 27, 2011 at 12:09 PM, Dean Michael Berris
<mikhailberis_at_[hidden]> wrote:
> On Thu, Jan 27, 2011 at 5:32 PM, Matus Chochlik <chochlik_at_[hidden]> wrote:
[snip/]
>>
>> Last time I checked, JPEG, MPEG, Base64, ASN1, etc., etc., were not
>> *text* encodings. And I believe that handling text is what the whole
>> discussion is ultimately about.
>>
>
> But why do you need to separate text encoding from encoding in
> general? Here's the logic:

In general? Nothing. I do not have (nor did I have in the past)
anything against a general efficient encoding-agnostic string
if it is called general_string. But std::string IMO is and always
has been primarily about handling text. I certainly do not know
anyone who would store a MPEG inside std::string.

>
> You have a sequence of bytes (in a string).
> You want to interpret that sequence of bytes in a given encoding (with a view).
>
> Why does the encoding have to apply only to text?

Encoding does not have to apply only to text, but my,
let's call it a vision, is, that the "everyday" handling
of text would use a single encoding. There are people
who have invested a whole lotta of love :) and time into
making it possible and they are generally called
Unicode consortium. C++(1x) already adopts part of
their work via the u"" and U"" literal types, because
it has countless advantages. Why not take a one more
step in that direction and use it for the 'string' type by
default.

>
>> [snip/]
>>>
>>> But this already happens, it's called 7-bit clean byte encoding --
>>> barring any endianness issues, just stuff whatever you already have in
>>> a `char const *` into a socket. HTTP, FTP, and even memcached's
>>> protocol work fine without the need to interpret strings other than a
>>> sequence of bytes; my original opposition is having a string that by
>>> default looked at data in it as UTF-8 when really a string would just
>>> be a sequence of bytes not necessarily contiguous.
>>
>> Again, where you see a string primarily as a class for handling
>> raw data, that can be interpreted in hundreds of different ways
>> I see primarily string as a class for encoding human readable text.
>>
>
> So what's the difference between a string for encoding human readable
> text and a string that handles raw data?

Usability. It is usually more difficult to use the super-generic everything-
solving things. I again for probably the 10-th time repeat that I'm not against
such string in general but this is not std::string.

[snip/]

>> Because the byte sequence is interpreted into *text*.
>
> So?
>
>> Let me try one more time: Imagine that someone
>> proposed to you that he creates a ultra-fast-and-generic
>> type for handling floating point numbers and there would
>> be ~200 possible encodings for a float or double and
>> the usage of the type would be
>>
>> uber_float x = get_x();
>> uber_float y = get_y();
>> uber_float z = view<acme_float_encoding_123_331_4342_Z>(x) +
>> view<acme_float_encoding_123_331_4342_Z>(y);
>> uber_float w = third::party::math::log(view<acme_float_encoding_452323_X>(z));
>>
>> would you choose it to calculate your z = x + y
>> and w = log(z) in the 98% of the regular cases where
>> you don't need to handle numbers on the helluva-big
>> scale/range/precision? I would not.
>>
>
> So what's wrong with:
>
> view<some_encoding_0> x = get_x();
> view<some_encoding_1> y = get_y();
> view<some_encoding_3> z = x+y;
> float w = log(as<acme_float_encoding>(z));

Unnecessary verbosity.

Do you really want all the people that now do:

struct person
{
    std::string name;
    std::string middle_name;
    std::string family_name;
    // .. etc.
};

to do this ?

struct person
{
    boost::view<some_encoding_tag> name;
    boost::view<some_encoding_tag> middle_name;
    boost::view<some_encoding_tag> family_name;
    // .. etc.
};

>
> ?
>
> See, there's absolutely 0 reason why you *have* to deal with a raw
> sequence of bytes if what you really want is to deal with a view of
> these bytes from the outset.
>
> Again I ask, am I missing something here?

Please see the example above.

[snip/]
>
> Right, what I meant to say is that it hardly has any bearing when
> we're talking about engineering solutions. So your circumstances and
> mine may very well be different, but that doesn't change that we're
> trying to solve the same problem. :)
>

If along solving your problem (all the completely valid points
that you had about the performance) we also solve my and
other's problem (completely valid points about the encoding)
and we think about the acceptability and "adoptability",
we provide a backward compatible interface for people who
do not have the time to re-implement all their string-related
code at once and try really hard to get it into the standard
than I do not have a thing against it.

BR,

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk