Boost logo

Boost :

From: Erik Wien (wien_at_[hidden])
Date: 2004-10-19 13:52:40


Hi. Thanks for the feedback!

"Miro Jurisic" <macdev_at_[hidden]> wrote in message
news:macdev-BACD3C.13585519102004_at_sea.gmane.org...
> I generally agree with this design approach, but I don't think that code
> point
> iterators alone are sufficient.

Neither do I as the matter a fact, but this is as far as I have come right
now. :) There would probably be different types of iterators (or iterator
wrappers) made available to enable iterations over everything from code
units to code points/abstract characters.

> Iteration over encoded characters and abstract
> characters would be needed for some algorithms to function sensibly. For
> example, the simple task of:
>
> find(begin, end, "ü")
>
> needs to use abstract characters in order to be able to find precomposed
> and
> decomposed versions of ü.
>

True... And this is a point where implemtation would be less than trivial.
Comparing strings in unicode is anything BUT trivial, and it's imperative to
find a good way to implement this functionallity through the standard
algorithms.

> Again, taking this example, you let's say that do_some_operation performs
> canonicalization to some Unicode canonical form; you can't do this by
> iterating
> over code points.
>

Nope. A code unit iterator would be needed for things like that.

>> I am aware that this implementation will be less that ideal for
>> integration
>> with the current c++ standard, but it's issues like that I would like to
>> get
>> deeper into during the develpoment.
>
> You should explain what problems with integration you foresee.

I think I was thinking a little ahead of myself when I wrote that. :) The
implementation described here would not pose too much of a problem, I was
thinking more of the problems that arise when you take things like collation
and locales into consideration. From what i understand there is a real issue
in enabling proper unicode support in the standard classes like locale,
ctype and collate, as they assume things that do not neccesarily apply to a
unicode representation of text. A failiure to enable good support in those
classes (at least locale and ctype), would also make the iostream support
break, and things start to snowball. I could very well be wrong on this
(Actually, I hope I am! :) ), as I haven't had the time to read up on all
issues concerning this. But again, this is one of many problems I hope
running this project will help reveal.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk