Boost :

Date view	Thread view	Subject view	Author view

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-01-09 15:29:33

Next message: Eric Niebler: "[boost] subversion UTF-8 conversion errors"
Previous message: Boris Gubenko: "[boost] [random] failure of random_test on Red Hat Linux 2.6.9 ia64"
In reply to: Sebastian Redl: "Re: [boost] Strings tagged with their character set"
Next in thread: Sebastian Redl: "Re: [boost] Strings tagged with their character set"
Reply: Sebastian Redl: "Re: [boost] Strings tagged with their character set"

It's nice to see this thread from September picked up again, as I was a
bit disappointed by the volume of response at the time to my proposal.
I may be plugging this code in to something real quite soon, and will
try to drum up some interest again here if I do. With XML being
mentioned again, I think that character sets are something that need attention.

[Be warned that some readers will not see new messages on this old thread.]

Sebastian Redl wrote:
> David Rodr?guez Ibeas wrote:
>> On Sep 27, 2007 5:31 PM, Joseph Gauterin <joseph.gauterin_at_[hidden]> wrote:

[putting back the context]
>>> If we had mutable strings consider how badly the following would perform:
>>> std::replace(utfString.begin(),utfString.end(),SingleByteChar,MultiByteChar);
>>> Although this looks O(n) at first glance, it's actually O(n^2), as the
>>> container has to expand itself for every replacement. I don't think a
>>> library should make writing worst case scenario type code that easy.

>> While this is a problem that I don't know if has a solution, an alternative
>> replace can be implemented in the library that performs in linear time by
>> constructing a new string copying values an replacing on the same iteration.
>> Could std::replace() be disabled somehow?? (SFINAE??)
>>
> It ought to be possible to overload it and, if the string is not part of
> std, have the overloaded version be picked up with ADL. Only if
> replace() isn't explicitly qualified, of course, which is a problem.
> But I think immutable strings are the way forward anyway.

For a UTF-8 string, my proposal offered

   a mutable random-access byte iterator
   a const bidirectional character iterator
   a mutable output character iterator

std::replace needs a mutable forward iterator, so you wouldn't be able
to apply it to the character iterator. The library wouldn't "let you
write worst case code".

There is, however, the replace_copy algorithm, which I think does
exactly what you need; it takes a pair of input iterators and an output
iterator, i.e. something like

utf8_string s1 = "......";
utf8_string s2;
std::replace_copy(s1.begin(),s1.end(),
utf8_string::character_output_iterator(s2),
L'x',L'y');

Concerning mutable vs. immutable strings: which is best in any
particular case clearly depends on the size of the string, the
operation being performed, and whether it has a variable-length
encoding. The programmer should be allowed to choose which to use.
(An interesting case is where the size or character set changes at
run-time, and a run-time choice of algorithm is appropriate.)

Regards,

Phil.

Next message: Eric Niebler: "[boost] subversion UTF-8 conversion errors"
Previous message: Boris Gubenko: "[boost] [random] failure of random_test on Red Hat Linux 2.6.9 ia64"
In reply to: Sebastian Redl: "Re: [boost] Strings tagged with their character set"
Next in thread: Sebastian Redl: "Re: [boost] Strings tagged with their character set"
Reply: Sebastian Redl: "Re: [boost] Strings tagged with their character set"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk