Boost :

Date view	Thread view	Subject view	Author view

From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-20 08:08:33

Next message: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Previous message: Vladimir Prus: "[boost] Re: Re: Re: Any interest in adding unicode support to boost?"
In reply to: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Miro Jurisic: "[boost] Re: Any interest in adding unicode support to boost?"

On Tue, 19 Oct 2004 18:32:50 +0200, Erik Wien <wien_at_[hidden]> wrote:
> ----- Original Message -----
> From: "Rogier van Dalen" <rogiervd_at_[hidden]>
>
> > I've recently started on the first draft of a Unicode library.
> >
>
> Interesting. Is there a discussion going about this library that I have
> missed, or haven't you posted anything about it yet? I'd hate to start
> something like this, if there is already being made an effort on the
> subject.

It's in the planning stage; I have a preliminary implementation of
some parts. Your message made me bring out my ideas into the public.

> > I think a definition of unicode::code as uint32_t would be much
> > better. Problem is, codecvt is only implemented for wchar_t and char,
> > so it's not possible to make a Unicode codecvt without manually adding
> > (dummy) implementations of codecvt<unicode::code,char,mbstate_t> to
> > the std namespace. I guess this is the reason that Ron Garcia just
> > used wchar_t.
> >
> I don't really feel locking the code unit size to 32bits is a good solution
> either as strings would then become unneccesarily large.

As I tried to show, the choice of the underlying buffer is templated.
This could be std::string, or an SGI rope<wchar_t>, or anything else.
A char-based buffer would automatically make it a UTF-8-encoded
string, etcetera. I agree with you (and with the Unicode standard)
that using strings of UTF-16 is probably best for most practical
applications. The interface should IMHO always use UTF-32 (I agree
with the Unicode standard here too):
codepoint_string<...> s = ....;
I think *s.begin() should return a UTF-32-encoded codepoint.

The codecvt class converts to UTF-32 because it didn't occur to me to
do anything else; and why would you?

Regards,
Rogier

Next message: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Previous message: Vladimir Prus: "[boost] Re: Re: Re: Any interest in adding unicode support to boost?"
In reply to: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Miro Jurisic: "[boost] Re: Any interest in adding unicode support to boost?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk