Boost logo

Boost :

From: Ferdinand Prantl (ferdinand.prantl_at_[hidden])
Date: 2003-09-15 12:09:25


> -----Original Message-----
> From: Adrian Michel [mailto:michel_at_[hidden]]
> Sent: Monday, September 15, 2003 02:18
> To: Boost mailing list
> Subject: RE: [boost] Unicode and filesystem
> >
> > wchar * != unicode.
> >
> > Just using wchar_t is by far not enough to expect unicode
> support. You
> > generally need to know what encoding a string is in. In particular,
> > some encodings such as utf-8 don't even have a fixed-width
> > representation.

Yes, you cannot be sure about char (ASCII, ANSI, UTF-8) and also wchar_t
(UCS-2, UTF-16). That is why it is defined externally from the C-libs
(environment on un*xes, registry/extra API on Windows).

> >
> > You really need a unicode library to deal with these issues. A
> > unicode-enabled path manipulation library would have to
> depend on it.

Yep. This is the reason, why we do not have char/wchar_t in constructor
parameters of std::fstream. FILE * fopen(const char*) does not allow wchar_t
parameters and the simpliest for std::fstream is to keep it this way.

> > Of course, one could provide a templates based interface, where you
> > plug in the unicode support in terms of string and trait type
> > parameters.

One must be careful because a single C++-type (char or wchar_t) can mean
more character encodings, which must be handled differently. Saying the
other way, the supporting functions in C and the traits classes in C++ must
have according names and must be explicitely used (e.g. utf8_traits,
ucs2_traits, utf16_traits, ...), which can bring more complexity than gained
functionality. A conversion for the underlying fopen() & comp. would still
have to be performed in the implementation. The Unicode supporting traits
should go well with the Unicode support in codecvt facets.

> >
>
> I did not mention that I was referring to Unicode on Windows,
> which uses 16 bits characters.

Windows OS uses the unique charcters numbering defined by Unicode
Consortium; but not the full range, only the UCS-2 part <0;65535>. It is not
necessary to use surrogate pairs for greater numbers then (which do UTF-X),
and the Windows can simply use fixed-size characters (2 bytes = wchar_t),
which is nicely supported by C++ compilers/STL.

Sometimes you can read, that Unicode characters can be expressed as
fixed-byte character encodings - UCS-2 and UCS-4 - and multi-byte character
encodings - UTF-8, UTF-16 and UTF-32. The former is better for in-memory
operations as the length is the same as the count of characters, the latter
saves places when saved (T = transformation).

It is not correct because about UCS-X nobody said to be represented as
fixed-X-byte characters, only UTF-X have defined representation. But let us
have our living simplier observing the real practice...

>
> There is support for data streams of Unicode characters. For
> example, in
> boost/filesystem/fstream.hpp:
>
> ...
> typedef basic_fstream<char> fstream;
> typedef basic_fstream<wchar_t> wfstream
> ...

But this support is only to read characters of selected type from the
stream, it is not meant to solve constructor parameters, even if sometimes
it would help. Constructor of fstream should have another
char-template-parameter, independent from the content-one.

Stream content conversions are supported quite good through codecvt facets.
For constructor of fstream you have to knock up some conversion methods,
maybe using stringstream...

>
> The class path has the following constructors
> path();
> path( const std::string & src );
> path( const char * src );
> path( const std::string & src, path_format );
> path( const char * src, path_format );
>
> none of which takes 16 bit Unicode characters. Creating a
> template version of path with the character type as parameter
> may be a solution to this problem.

I understand you; you usually some work with path using wstrings but then
you have to convert it to feed it into fstream(). You can convert it to
string or look to your STL, if it supports initialization from FILE *
(filebuf or fstream), for wchich you can use _wfopen().

I would also appreciate templates in boost filesystem or some other solution
with converting helpers.

>
> I am currently using MS VC++ 6 and their implementation of
> std::fstream also lacks Unicode support for file names.

As I commented above.

> I am not sure about .net though.

Hmm, it is too early, but .NET 2.0 contains templates, probly the boost.net
is on the horizon... :-)
Um, you meant VC++.NET - it looks the same for 7.0 and 7.1 and Whidbey, FYI.

Ferda

>
> Adrian
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk