Boost logo

Boost :

From: Andrey Semashev (andysem_at_[hidden])
Date: 2007-06-20 16:31:21


Mathias Gaunard wrote:
> Andrey Semashev wrote:
>
>> I'd like to note that Unicode consumes more memory than narrow
>> encodings.
>
> That's quite dependent on the encoding used.
> The most popular Unicode memory-saving encoding is UTF-8 though, which
> doubles the size needed for non ASCII characters compared to ISO-8859-*
> for example. It's not that problematic though.

UTF-8 is a variable character length encoding which complicates
processing considerably. I'd rather stick to UTF-16 if I had to use
Unicode. And it's already twice bigger than ASCII.

> Alternatives which use even less memory exist, but they have other
> disadvantages.
>
>
>> This may not be desirable in all cases, especially when the
>> application is not intended to support multiple languages in its
>> majority of strings (which, in fact, is a quite common case).
>
> Algorithms to handle text boundaries, tailored grapheme clusters,
> collations (some of which are context-sensitive) etc. are needed to
> process correctly any one language.
> So you need Unicode anyway, and better reuse the Unicode stuff than work
> on top of a legacy encoding.

I'm not saying that we don't need Unicode support. We do!
I'm only saying that in many cases plain ASCII does its job perfectly
well: logging, system messages, simple text formatting, texts in
restricted character sets, like numbers, phone numbers, identifiers of
all kinds, etc. There are cases where i18n is not needed at all - mostly
server-side apps with minimal UI. Being forced to use Unicode internally
in these cases means increased memory footprint and degraded performance
due to encoding translation overhead.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk