Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2004-10-19 01:18:27


"Miro Jurisic" <macdev_at_[hidden]> wrote in message
news:macdev-320EB0.01505419102004_at_sea.gmane.org...
> In article <cl268e$okv$1_at_[hidden]>, "Robert Ramey" <ramey_at_[hidden]>
wrote:
>
> > I think you should spend a little more time investigating the following:
> >
> > a) The "vault" files section has code by A Barbati which addresses issue
> > related to unicode.
> > b) Ron Garcia contributed codecvt facets for unicode that have been
> > incorporated into boost are currently used by two boost libraries
> > (serialization and program options.)
> > c) asni library functions exist for converting strings and characters
> > to/from wstrings/wchar s in accordance with the currently selected
locale.
> > Not all libraries implement these functions however.
> >
> > So its not clear to me what exactly needs to be done here - other than
> > fixing up some older stdandard libraries. I don't think that's what you
had
> > in mind.
>
> There is a lot Unicode work to be done in the standard C++ library and
boost.
> C++ currently has no Unicode-aware string abstraction, and this is a big
problem
> for anyone who has to deal with Unicode strings in C++ code. std::string
is
> poorly suited for any Unicode-savvy work, for many reasons -- mainly
having to
> do with the fact that std::string and STL and boost algorithms using
> std::string::iterator don't know how to handle strings in accordance with
the
> Unicode spec.
>

Hmmm - it would never occur to me to use std::string for characters wider
than 8 bits. My studied this issue in some detail and concluded that one
uses unicode or othe 2 ro 4 byte encoding, the simplest and most natural way
is to use std::wstring (a synonym for std::basic_string<wchar_t>. At this
point the only issues would be

a) implementations which are not based on basic_string (I don't know if
there are any of these around)
b) input/output to other encoding such as utf-8 or ? - this is handled by
codecvt facets.

I believe that STL and boost algorithms that handle std::string can (or
should) be able to handle any std::basic_string<?> . That is my basis for
the view that unicode shouldn't be a big issue. Of course if one want's to
handle unicode as std::string containing - say UTF-8 encoding of unicode
characters - then that would be a separate issue. I don't think anyone
would want to do that.

I'm willing to be convinced I'm wrong about this - but I just don't see it
yet.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk