Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-20 09:46:55


> >
> > OK, if the long term plan is:
> >
> > 1) design and implement boost::string using UTF-8 doing all the things
> > like code-point iteration, character iteration, convenience stuff like
> > starts-with, ends-with, replace, trim, etc., etc. with as much
> > backward compatibility with std::string as possible without hindering
> > progress
> >
> > 2) try really hard to push it to the standard
> >
> > then I'm on board with that.
>
> Some of those could be problematic (I've run across references implying
> that 0x20 isn't the universal word-separation character, so trim would
> at least need some extra parameters), but for the most part, I'd agree
> with it.

And also it is locale dependent.

Unicode defines 4 text segments: Grapheme, Word and Sentence.

   http://www.unicode.org/reports/tr14/

There is also line break boundaries defined:

   http://unicode.org/reports/tr29

Most of them are also locale dependent as require use of
dictionaries.

So unless you want to carry locale information in the string,
I don't think it is good to put these into the string itself.

Artyom

      


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk