From: Erik Wien (wien_at_[hidden])
Date: 2004-10-20 14:26:42
"Rogier van Dalen" <rogiervd_at_[hidden]> wrote in message
>> I think the best solution is to store the string in the form it was
>> originally recieved (decomposed or not), and instead provide composition
>> functions or even iterator wrappers that compose on the fly. That would
>> allow for composed strings to be used if needed (like in a XML library,
>> not imposing that requirement on all other users.
> I don't think I can agree on that. If you do a lot of input/output,
> this might yield a better performance, but even in reading XML, you
> probably need to compare strings a lot, and if they are not
> normalised, this will really take a lot of processing.
> Correct me if I'm wrong, but a simple comparison of two non-normalized
> Unicode strings would take looking up the characters in the Unicode
> Character Database, decomposing every single character, gathering base
> characters and combining marks, and ordering the marks, then comparing
> them. And this must be done for every character. I don't have any
> numbers, of course, but I have this feeling it is going to be really
> really slow.
You are quite correct... It is slow. And that is why I am hesitant to make
decomposition something that will happen every time you assign something to
What this really boils down to, is what kind of usage pattern is the most
common? The library should be written to provide the best performance on the
operations most people do.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk