Boost logo

Boost :

From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2007-06-19 18:16:03


Andrey Semashev wrote:

> I'd like to note that Unicode consumes more memory than narrow
> encodings.

That's quite dependent on the encoding used.
The most popular Unicode memory-saving encoding is UTF-8 though, which
doubles the size needed for non ASCII characters compared to ISO-8859-*
for example. It's not that problematic though.

Alternatives which use even less memory exist, but they have other
disadvantages.

> This may not be desirable in all cases, especially when the
> application is not intended to support multiple languages in its
> majority of strings (which, in fact, is a quite common case).

Algorithms to handle text boundaries, tailored grapheme clusters,
collations (some of which are context-sensitive) etc. are needed to
process correctly any one language.
So you need Unicode anyway, and better reuse the Unicode stuff than work
on top of a legacy encoding.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk