Boost logo

Boost :

From: Felipe Magno de Almeida (felipe.m.almeida_at_[hidden])
Date: 2007-09-26 17:28:03


On 9/26/07, Phil Endecott <spam_from_boost_dev_at_[hidden]> wrote:
> Joseph Gauterin wrote:
>

[snip]

> > I've noticed that there are frequent requests/proposals for some sort
> > of boost unicode/string encoding library. I've thought about the
> > problem and it seems to big for one person to handle in their spare
> > time
>
> Let me say "part time" rather than "spare time"...

Sorry to jump into the discussion, but I've been watching it since the
start. And I'm interested in this project too. Though I'm a little
swamped with work right now, I do work with exactly one use case
exposed in the thread: e-mail parsing (and etcs about it).
And tagging is exactly the best approach.
The worst part being: how do we compare external strings and email text?
Until now the safest approach I found is to convert everything to
Unicode and then compare both (if they don't have the same encoding).

> > - perhaps a group of us should get together to discuss working on
> > one? I'd be happy to participate.
>
> I would definitely encourage breaking the work up into smaller chunks.
> IMHO "smaller is better" for Boost libraries; there have been a number
> of occasions when I've discovered that a feature I want is hidden as an
> internal component of a Boost library, and I've felt that it should
> have been a stand-alone public entity. So let's think about how this
> work can be split up:

This seem like a very good approach.

> - A charset_trait class. I have started on this. The missing piece is
> a way to look up traits of character sets that are known at run-time;
> input would be appreciated.
>
> - Compile-time and run-time tagged strings. The basics of this are
> straightforward and done.

Not as easy if a "universal string" class is to be achieved. But we
can probably left it out for now.

> - Conversions. My approach at present is to use iconv via a functor
> that I wrote a while ago. I believe iconv is widely available;
> however, some implementations may support only a small set of character
> sets. Alternatives would be interesting.

I use icu extensively, never used iconv.

> - Variable width iterators, including the issue that you raised above.

boost.iterator makes this job quite easy.

> - Interaction with locales, internationalisation, and system APIs.

I'm not an IOStream expert, but I'm very use to working with Windows API.

> and no doubt more. Thinking about the interfaces between these areas
> and the user would be a good place to start.

There were also some interesting discussions about Unicode in the
past, though they didn't seem to go anywhere towards any conclusion.
But were raised very important concerns w.r.t internationalization.

> Regards,
>
> Phil.

Thanks Phil,

-- 
Felipe Magno de Almeida

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk