Boost logo

Boost :

Subject: Re: [boost] [UTF String] UTF String library 1.5 ready for perusal
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-02-10 16:19:49


Hi Chad,

Like Mathias I'm not very enthusiastic about the approach that you're
taking here - but there is plenty of space for different approaches, so
if you want to do it like this you are welcome to do so.

My own approach has been to:
- Store text in sequence-of-byte containers of whatever sort seem
appropriate, i.e. std::string, std::vector<char>, raw memory etc.
- Use iterator adaptors to access that data as UTF-8 when appropriate.
- Use std::algorithms like find(begin,end,what) rather than std::string members.

This works for me, and I recommend it.

So I have one comment on this exchange:

Chad Nelson wrote:

>> There is no need for any reasoning: look at the code of your code
>> point iterator. It uses a pointer and indexes, and is therefore not a
>> generic iterator adaptor.
>
> It wasn't meant to be generic. It was meant to be exactly what it is:
> an iterator specific to the UTF type where it's defined. For that
> purpose, it's designed exactly as it should be, IMHO.
>
>> Iterating through code points is fully generic and should work for
>> any forward iterator or bidirectional iterator, not just a pointer.
>
> I could make it fully generic, but it wouldn't be nearly as efficient
> that way. I chose to do the extra work to make it efficient.

I have to challenge your efficiency comment. I have UTF-8 encoding and
decoding that works with generic iterators, including pointers, and I
have no efficiency issues resulting from its genericity. In fact I
spent some time carefully optimising it and I believe that when used
with pointers it is as good as I could get by writing it in assembler.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk