Boost logo

Boost :

Subject: Re: [boost] Summer of Code 2010
From: Robert Ramey (ramey_at_[hidden])
Date: 2010-03-07 16:59:36


Andrew Sutton wrote:
>> Concerning Unicode, I did the foolish choice this year of working
>> full-time as the same time as I finish my studies, which hasn't left
>> me as much free time as I would have liked it to.
>> I will be resuming work on it this summer, however, as I am quite
>> keen on getting it into Boost.

>> One concern I have is that it is quite close to Boost.Iterator,
>> Boost.Range, Boost.RangeEx and Boost.StringAlgo, and I would like my
>> changes to be added as improvement upon those libraries if possible.
>> But I guess proposing everything under Boost.Unicode makes it quite
>> easier to review and all.

Here is some information which may or may not be relevant here.

a) A number of years ago, I needed a codecvt facet to output UTF-8.
I found one written by our own Ron Garcia. I checked
it in as part of the seriailization library. It's been hassle free for all
this
time. I believe that others have used it as well.

b) As I recall, the original code also included the above conversion
as an iterator - I don't see that code around any more.

c) Also as part of the library I needed a bunch of converters
(multi-byte <-> unicode, etc) which needed to be composed with
a bunch of other filters (base64, etc.). I made "dataflow iterators"
which use just the addition of templated constructors to the boost
iterators. All this has worked well as large part of the implementation
occurs at compile time.

d) In the meantime boost.iostreams got made which also included
some of this facility. This code isn't part of the codecvt facet
machinery - which is where I would think it should be.

Soooooo - from my perspective, I would like to see.

a) templated constructors added to boost.iterators
b) a bunch of iterator adaptors - some of which are like "dataflow
iterators".
c) a class for construction a codecvt facet out any iterator adaptor.

Previous efforts at a unicode library have seemed to bog down
in codepoints and whole lot of other stuff I don't understand.

I don't know that I have a real point here. But for a long time
it has seemed to be a lost opportunity to unify a bunch of stuff
which seems to be re-invented all the time.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk