Boost logo

Boost :

From: Andy Little (andy_at_[hidden])
Date: 2004-10-22 16:52:34


"Erik Wien" <wien_at_[hidden]> wrote in message
news:cl1tqh$qp$1_at_sea.gmane.org...
> Hi. I am in the process of planning a library for handling unicode strings
> in C++, and would like to probe the interest in the boost community for
> something like that. I read through the unicode dicussion that was up back
> in april, and from what I could gather there was some amount of interest,
> but no one felt comfortable taking on the task as of yet.

[snip]

> I really feel the C++ language needs some form of standardized unicode
> support, and developing such a library within the boost community would be
a
> very good way to ensure it fits everybody's needs the best possible way.
>
> If you have any, and I do mean ANY, thoughts on this, please do not
hesitate
> to reply to this mail and let me know. I'm looking forward to your
> responses.

FWIW Here my thoughts..

There is no equivalence between std::string (aka std::string, std::wstring)
and a sequence of characters conforming to an encoded sequence (aka
encoded-string).

However an encoded-string can (potentially) be converted to a string, but
not the other way round, because the std::string does not provide adequate
information.

For an encoding scheme to work the encoding must be provided, and must be
run time. The best way to do this for various encodings is to use packets,
with headers providing the information regarding the contents, eg type of
encoding, number of characters, checksum etc. These packets themselves could
be manipulated in std::strings (including sequences of packets), which could
then be used to perform operations where the encoding is not important.
This should combine the best combination of performance, both in speed and
size.

regards
Andy Little


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk