Boost logo

Boost :

From: Glen Knowles (gknowles_at_[hidden])
Date: 2004-10-21 19:45:27


> From: Eric Niebler
> Erik Wien wrote:
> > "Miro Jurisic" <macdev_at_[hidden]> wrote in message news:macdev-
> >
> >>I am not sure I buy this. I think that if you want to have
> unchecked
> >>Unicode data, you should use a vector<char*_t>. Unicode
> strings have
> >>well-defined invariants with respect to canonicalization and
> >>well-formedness, and I think that the a Unicode string abstraction
> >>should enforce those invariants.
> >>
> >>Having intermediate states that are invalid and a final
> state that is
> >>valid is not a feature, it's a bug. It's a silent failure
> that I want
> >>to know about.
> >
> >
> > Amen. ;)
> >
>
> No fair bringing religion into this. ;-) I'll repeat what I
> said before
> -- this would be an unfortunate design, and you'll hear about it from
> your users. If you force people to do their bit twiddling in
> vector<char*_t>, then you impose an extra allocation and a
> copy to get
> it into a unicode::string, and most people won't bother.

If it imposes a copy I certainly won't use it. What I'm interested in are
functions to compare utf* encoded arrays and create sort keys from those
same arrays. In truth, all I need are unicode aware versions of strcoll,
strxfrm, and strlwr that don't require locking a mutex around the global
locale in my multithreaded code. But other things such as substring and
regex matching would also be welcome.

I'm sure its already been discussed adnausem, but the fact that atof,
ostringstream ctor/dtor, printf, tolower, etc all may mutex around access of
a global locale has forced me to strip them out of quite a bit of code. In
my tests this was fine on a uniprocessor, marginal on a dual, but on a quad
or better it was the single biggest bottleneck.

I almost always want to treat strings as char arrays. Most operations are
simply copying/moving or checking to see if the string is in a set or hash.
If I need to do a lot of locale aware comparisons I'm going to generate sort
keys first.

Glen


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk