From: Pavol Droba (droba_at_[hidden])
Date: 2008-08-28 09:53:18
Martin Lütken wrote:
> Martin Lutken wrote:
>> Anyone who knows how this could be made possible?
>> I suppose I need a locale facet like the std::ctype, but which works for
>> UTF-8, and not just for ASCII a-z,A-Z. I guess the information in a table
>> like this (http://www.unicode.org/Public/UNIDATA/CaseFolding.txt)
>> could be used.
> This might not work out-of-the-box. StringAlgo lib is designed around the sequences
> od characters. Since UTF-8 have variable character with encoding, algotrithms
> in the library would not work as expected.
> To make it working, you will need a container with iterators, that will
> iterate over meta-characters, not bytes.
>> If it's better/easier just to convert the string to UTF-32 before doing case
>> insensitive compares, replaces I could live with that.
> If you meant UTS-32 and you have a corresponding locale implementation, than
> this approach is a viable solution.
> Sorry, what is UTS-32 ? I tried to Google it: 351 results, with none of them
> looking like char encoding related.
> I found this article on Wikipedia on UTF-32/UCS-4:
> Is it not what I need ?
> I suspect that many people must have ran into similar problems. Perhaps we should
> add a 32 bit string class to Boost. And until I get a better understanding, I will
> keep calling it UTF-32 :-)
Sorry, I mixed up it a little. I meant UCS-4 a.k.a fixed-width encoding. I was not
aware that UTF-32 id de-facto the same.
Anyway, the statement about usability with StringAlgo still holds. It can work with
any fixed-size encoding, as long as you have the corresponding locales.
It could theoretically work also with variable-with characters, provided you
have a container/localte framework, that allows to operate on metacharacters.
I'm not sure how efficient it will be, though.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk