Boost logo

Boost :

From: Patrick Bennett (patrick.bennett_at_[hidden])
Date: 2004-08-23 09:47:52


Stefan Seefeld wrote:

> Patrick Bennett wrote:
>
>> It should be char* (and std::string) UTF-8 strings throughout for all
>> platforms - passing as-is for platforms like Linux, and converting
>> to/from UCS-2 on Windows. I can't speak for other platforms as I'm
>> most familiar with Windows and Linux.
>
>
> Isn't it abusive to force utf-8 into a std::string ?

Abuse is a relative term here. ;)

> While it is technically
> possible the semantics isn't quite the same. operator [] (size_t i)
> wouldn't
> return the i'th character any more, at least not for characters
> outside the
> ascii range.

Correct (kind of), but I'd far prefer that std::string be used than for
some completely new type to be defined.
For users of boost::filesystem, I can't personally think of a time when
a user would need to iterate the paths or files a character at a time.
Because of UTF-8's nature, even if a user were to search for something
like '/', it would still work for find's, [], etc. UTF-8 maps to
std::string extremely well.
I think there is also a fair amount of precendents already set for using
UTF-8 internally using std::string as the storage mechanism.
UTF-8 strings don't contain embedded nul's (std::string still works for
that though), ASCII characters remains ASCII characters, and you can
tell if you're in the middle of a multi-byte sequence.

Since we're talking about filesystem's inability to be used with
internationalized applications, and you don't think UTF-8/std::string is
the way to do it, what is your recommendation?

Cheers...
Patrick Bennett


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk