Boost logo

Boost :

Subject: Re: [boost] [string] Realistic API proposal
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-28 08:58:53


> > 3. It allows to use std::string meanwhile under the hood as storage
> > giving high efficiency when assigning boost::string to std::string
> > when the implementation is COW (almost all implementations with
> > exception of MSVC)
>
> COW implementations of std::string are not allowed anymore starting with
>C++0x.
>
>

Shame, I still have a little hope that n2668 would be reverted back.

> > 4. It is full unicode aware
> > 5. It pushes "UTF-8" idea to standard C++
> > 6. You don't pay for what you do not need.
>
> What am I paying for? I don't see how I gain anything.
>

You don't pay on validation of the UTF-8 especially when 99% of uses
of the string are encoding-agnostic.

> >
> > #ifdef C++0x
> > typedef char32_t const_code_point_type;
> > #else
> > typedef unsigned const_code_point_type;
> > #endif
>
> Just define boost::char32 once (depending on BOOST_NO_CHAR32_T) and use
> that instead of putting ifdefs everywhere.
> (that's what boost/cuchar.hpp does in my library)
>

Good point

>
> > // UTF validation
> >
> > bool is_valid_utf() const;
>
> See, that's what makes the whole thing pointless.

Actually not, consider:

   socket.read(my_string);
   if(!my_string.is_valid_utf())
      ....

> Your type doesn't add any semantic value on top of std::string,
> it's just an agglomeration of free functions into a class. That's a terrible
>design.
> The only advantage that a specific type for unicode strings would bring is
>that it could
> enforce certain useful invariants.
>

You don't need to enforce things you don't care 99% of cases.

> Enforcing that the string is in a valid UTF encoding and is normalized
> in a specific normalization form can make most Unicode algorithms several
> orders of magnitude faster.

You do not always want to normalize text. It is user choice you
may have optimized algorithms for already normalized strings
but it is not always the case.

Also what kind of normalization NFC? NFKC?

>
> All of this is trivial to implement quickly with my Unicode library.
>

No, it is not.

Your Unicode library is locale agnostic which makes it quite
useless in too many cases.

Almost every added function was locale sensitive:

- search
- collation
- case handling

And so on. This is major drawback of your library that
it is not capable of doing locale sensitive algorithms
that are vast majority of the Unicode algorithms

Artyom

      


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk