|
Boost : |
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-01-09 15:29:33
It's nice to see this thread from September picked up again, as I was a
bit disappointed by the volume of response at the time to my proposal.
I may be plugging this code in to something real quite soon, and will
try to drum up some interest again here if I do. With XML being
mentioned again, I think that character sets are something that need attention.
[Be warned that some readers will not see new messages on this old thread.]
Sebastian Redl wrote:
> David Rodr?guez Ibeas wrote:
>> On Sep 27, 2007 5:31 PM, Joseph Gauterin <joseph.gauterin_at_[hidden]> wrote:
[putting back the context]
>>> If we had mutable strings consider how badly the following would perform:
>>> std::replace(utfString.begin(),utfString.end(),SingleByteChar,MultiByteChar);
>>> Although this looks O(n) at first glance, it's actually O(n^2), as the
>>> container has to expand itself for every replacement. I don't think a
>>> library should make writing worst case scenario type code that easy.
>> While this is a problem that I don't know if has a solution, an alternative
>> replace can be implemented in the library that performs in linear time by
>> constructing a new string copying values an replacing on the same iteration.
>> Could std::replace() be disabled somehow?? (SFINAE??)
>>
> It ought to be possible to overload it and, if the string is not part of
> std, have the overloaded version be picked up with ADL. Only if
> replace() isn't explicitly qualified, of course, which is a problem.
> But I think immutable strings are the way forward anyway.
For a UTF-8 string, my proposal offered
a mutable random-access byte iterator
a const bidirectional character iterator
a mutable output character iterator
std::replace needs a mutable forward iterator, so you wouldn't be able
to apply it to the character iterator. The library wouldn't "let you
write worst case code".
There is, however, the replace_copy algorithm, which I think does
exactly what you need; it takes a pair of input iterators and an output
iterator, i.e. something like
utf8_string s1 = "......";
utf8_string s2;
std::replace_copy(s1.begin(),s1.end(),
utf8_string::character_output_iterator(s2),
L'x',L'y');
Concerning mutable vs. immutable strings: which is best in any
particular case clearly depends on the size of the string, the
operation being performed, and whether it has a variable-length
encoding. The programmer should be allowed to choose which to use.
(An interesting case is where the size or character set changes at
run-time, and a run-time choice of algorithm is appropriate.)
Regards,
Phil.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk