Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2011-08-16 13:05:27

Next message: Guilherme Kunigami: "Re: [boost] How to use BCP"
Previous message: Vicente Botet: "Re: [boost] Boost.Conversion review"
In reply to: Stewart, Robert: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"

On Sat, Aug 13, 2011 at 23:24, Robert Ramey <ramey_at_[hidden]> wrote:

> Dave Abrahams wrote:
>
> >> std::string represents a sequence of "char" objects that happens
> >> to be useful for text processing. It can represent a text in any
> >> encoding.
> >>
> >> The question is how we treat this sequence... And this is a
> >> matter of policy and requirements of the library.
> >
> > I think I agree with Artyom here. *Somebody* has to decide how that
> > datatype will be interpreted when we receive it. Unless we refuse
> > altogether to accept std::string in our interfaces (which sounds like
> > a
> > bad idea to me), why not make the decision that it's UTF-8?
>
> hmmm - why can't we just leave it at "std::string represents a sequence of
> "char""
>

Because we are talking here what 'a sequence of char' means, and you *must*
define it somehow.

and define some derivative class which defines it as a
> "a refinement of std::string which supports UTF-8 functionality" ?
>

Even when wrapping it you must still define the conversions from 'sequences
of chars'. Here we come to the original problem.

On Mon, Aug 15, 2011 at 16:19, Stewart, Robert <Robert.Stewart_at_[hidden]>wrote:

> [...]
> As soon as the client did a cast, the client made the claim that
> non_utf_string met the requirements of the text class' constructor. The
> problem is that of the client misusing the class by an ill-advised cast.
> What's more, I think Soares indicated a debug-build validation that the
> argument indeed was UTF-8.
>
> I don't see a problem in that design, once the constructor is explicit.
>

I don't want to do any explicit casts. I want UTF-8 by default, at least as
an optional feature for me and others who think like me. I can afford the
risk of writing wrong code, which is really small if you know what you're
doing. And I'm saying this as a maintainer of ~1MLOC codebase which uses
this convention on *windows*.

Regarding UTF-8 validation, it's not bullet-proof. Many non-UTF8 sequences
may pass the validation. 8-bit encodings that don't coincide with ASCII are
even more likely to result in false positives.

> > > Besided it does not harm you in any way
> >
> > It does. I already use UTF-8 for all my strings, even on
> > windows, and I don't want the code-bloat of all these
> > conversions (even if they're no-ops).
>
> What code bloat do you get from NOPs? Sure, there is more compilation time
> for the compiler to parse the text code and then for the optimizer to
> streamline it into a NOP, but even that is very likely negligible.
>

I'm talking about source-code bloat. About the boilerplate code I have to
write even if I already use UTF-8 everywhere:

std::string str = some_utf_8_string;
boost::utf8_function(text(str)); // Yes, I like UTF-8
boost2::utf8_function(str); // but I like it more when it's the default.

-- 
Yakov

Next message: Guilherme Kunigami: "Re: [boost] How to use BCP"
Previous message: Vicente Botet: "Re: [boost] Boost.Conversion review"
In reply to: Stewart, Robert: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk