|
Boost : |
From: Erik Wien (wien_at_[hidden])
Date: 2004-10-20 13:04:39
"Eric Niebler" <eric_at_[hidden]> wrote in message
> Such a one-size-fits-all unicode_string is guaranteed to be inefficient
> for some applications.
Yes... That's why I would like the encoding to be templated. Allowing the
programmer to choose the encoding best suited for his/her needs.
> If it is always stored in a decomposed form, an XML library probably
> wouldn't want to use it, because it requires a composed form. And making
> the encoding an implementation detail makes it inefficient to use in
> situations where binary compatibility matters (serialization, for
> example).
I think the best solution is to store the string in the form it was
originally recieved (decomposed or not), and instead provide composition
functions or even iterator wrappers that compose on the fly. That would
allow for composed strings to be used if needed (like in a XML library, but
not imposing that requirement on all other users.
> Also, it is impossible to store an abstract unicode character in char32_t
> because there may be N zero-width combining characters associated with it.
Quite true.. Storing abstract characters would require some variable width
storage facility.
> Perhaps having a one-size-fits-all unicode_string might be a nice default,
> as long as users who care about encoding and canonical form have other
> types (template + policies?) with knobs they can twiddle.
I would really like to provide enough knobs to keep everyone happy! ;)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk