Boost logo

Boost :

Subject: Re: [boost] Interest in Unicode library for Boost?
From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2018-09-23 15:37:18

On Sun, Sep 23, 2018 at 4:57 AM Andrey Semashev via Boost <
boost_at_[hidden]> wrote:
> On 9/23/18 7:45 AM, Zach Laine via Boost wrote:
> I think a Unicode library is very much needed in Boost.
> Out of curiosity, it looks like you implemented Unicode algorithms
> yourself. Why not use a specialized library, like ICU?

It's partly a question of the size of ICU, which is several megabytes,
whereas Boost.Text is only 1.2-2MB depending on your compiler.

I built HEAD of ICU just now, and here are the resulting .so's:

-rwxrwxr-x 1 tzlaine tzlaine 26M Sep 23 10:29 ./lib/
-rwxrwxr-x 1 tzlaine tzlaine 3.6M Sep 23 10:28 ./lib/
-rwxrwxr-x 1 tzlaine tzlaine 65K Sep 23 10:28 ./lib/
-rwxrwxr-x 1 tzlaine tzlaine 66K Sep 23 10:28 ./lib/
-rwxrwxr-x 1 tzlaine tzlaine 234K Sep 23 10:28 ./lib/
-rwxrwxr-x 1 tzlaine tzlaine 2.2M Sep 23 10:28 ./lib/
-rwxrwxr-x 1 tzlaine tzlaine 5.3K Sep 23 10:28 ./stubdata/
-rwxrwxr-x 1 tzlaine tzlaine 83K Sep 23 10:28

So, I don't know how many of those you need, but if you require data (and
you do!), 26MB is a lot. Note that I put collation data into headers, so
your runtime memory footprint might be much larger than 1.2-2MB, but the
minimum requirement is still only that small. Requiring the user to pay
more than this minimum is a classic "Don't pay for what you don't use"

Another thing is that ICU allocates memory all over the place, in some
cases needlessly.

ICU also has IMO a poor (too complicated and confusing) API; there are way
too many types and functions, and the types that are emphasized are often
the wrong ones, like UTF-16 strings. The algorithms should be C++-style
algorithms if this is something we're going to standardize.


Boost list run by bdawes at, gregod at, cpdaniel at, john at