Boost logo

Boost :

Subject: Re: [boost] Interest in Unicode library for Boost?
From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2018-09-23 15:37:18


On Sun, Sep 23, 2018 at 4:57 AM Andrey Semashev via Boost <
boost_at_[hidden]> wrote:
>
> On 9/23/18 7:45 AM, Zach Laine via Boost wrote:
>
> I think a Unicode library is very much needed in Boost.
>
> Out of curiosity, it looks like you implemented Unicode algorithms
> yourself. Why not use a specialized library, like ICU?

It's partly a question of the size of ICU, which is several megabytes,
whereas Boost.Text is only 1.2-2MB depending on your compiler.

I built HEAD of ICU just now, and here are the resulting .so's:

-rwxrwxr-x 1 tzlaine tzlaine 26M Sep 23 10:29 ./lib/libicudata.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 3.6M Sep 23 10:28 ./lib/libicui18n.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 65K Sep 23 10:28 ./lib/libicuio.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 66K Sep 23 10:28 ./lib/libiculx.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 234K Sep 23 10:28 ./lib/libicutu.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 2.2M Sep 23 10:28 ./lib/libicuuc.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 5.3K Sep 23 10:28 ./stubdata/libicudata.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 83K Sep 23 10:28
./tools/ctestfw/libicutest.so.62.1

So, I don't know how many of those you need, but if you require data (and
you do!), 26MB is a lot. Note that I put collation data into headers, so
your runtime memory footprint might be much larger than 1.2-2MB, but the
minimum requirement is still only that small. Requiring the user to pay
more than this minimum is a classic "Don't pay for what you don't use"
violation.

Another thing is that ICU allocates memory all over the place, in some
cases needlessly.

ICU also has IMO a poor (too complicated and confusing) API; there are way
too many types and functions, and the types that are emphasized are often
the wrong ones, like UTF-16 strings. The algorithms should be C++-style
algorithms if this is something we're going to standardize.

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk