Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-01-09 15:29:33


It's nice to see this thread from September picked up again, as I was a
bit disappointed by the volume of response at the time to my proposal.
I may be plugging this code in to something real quite soon, and will
try to drum up some interest again here if I do. With XML being
mentioned again, I think that character sets are something that need attention.

[Be warned that some readers will not see new messages on this old thread.]

Sebastian Redl wrote:
> David Rodr?guez Ibeas wrote:
>> On Sep 27, 2007 5:31 PM, Joseph Gauterin <joseph.gauterin_at_[hidden]> wrote:

[putting back the context]
>>> If we had mutable strings consider how badly the following would perform:
>>> std::replace(utfString.begin(),utfString.end(),SingleByteChar,MultiByteChar);
>>> Although this looks O(n) at first glance, it's actually O(n^2), as the
>>> container has to expand itself for every replacement. I don't think a
>>> library should make writing worst case scenario type code that easy.

>> While this is a problem that I don't know if has a solution, an alternative
>> replace can be implemented in the library that performs in linear time by
>> constructing a new string copying values an replacing on the same iteration.
>> Could std::replace() be disabled somehow?? (SFINAE??)
>>
> It ought to be possible to overload it and, if the string is not part of
> std, have the overloaded version be picked up with ADL. Only if
> replace() isn't explicitly qualified, of course, which is a problem.
> But I think immutable strings are the way forward anyway.

For a UTF-8 string, my proposal offered

   a mutable random-access byte iterator
   a const bidirectional character iterator
   a mutable output character iterator

std::replace needs a mutable forward iterator, so you wouldn't be able
to apply it to the character iterator. The library wouldn't "let you
write worst case code".

There is, however, the replace_copy algorithm, which I think does
exactly what you need; it takes a pair of input iterators and an output
iterator, i.e. something like

utf8_string s1 = "......";
utf8_string s2;
std::replace_copy(s1.begin(),s1.end(),
                   utf8_string::character_output_iterator(s2),
                   L'x',L'y');

Concerning mutable vs. immutable strings: which is best in any
particular case clearly depends on the size of the string, the
operation being performed, and whether it has a variable-length
encoding. The programmer should be allowed to choose which to use.
(An interesting case is where the size or character set changes at
run-time, and a run-time choice of algorithm is appropriate.)

Regards,

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk