|
Boost : |
Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-08-12 05:00:36
On Fri, Aug 12, 2011 at 9:57 AM, Daniel James <dnljms_at_[hidden]> wrote:
> On 11 August 2011 12:57, Artyom Beilis <artyomtnk_at_[hidden]> wrote:
>>
>>> There's a lot of existing code which is not based on that assumption -
>>> we can't just wish it out of existence and boost should be compatible
>>> with it.
>>
>> Then cross platform, Unicode aware programming will always
>> (I'm sorry) suck with Boost :-)
>>
>>
>> Thats it...
>
> Unless a different solution can be found.
I see the old flam .. er discussion on text handling is back :)
>From the previous debate(s) I now accept that it would
be a bad idea just to force the encoding of std::string to be utf8,
So a (nearly) ideal text handling class should IMO look like this
(see usage below):
// text encoding tag types for conversion function dispatching
namespace /*or struct */ textenc
{
struct utf8 {};
struct utf16 {};
struct utf32 {};
struct winapi {};
struct posix {};
struct stdlib {};
struct sqlite {};
struct libpq {};
...
struct libxyz {};
#if WE_ARE_ON_WINDOWS
typedef winapi os;
#elif WE_ARE_ON_POSIX
typedef posix os;
#elif ...
#endif
struct gcc {};
struct msvc {};
struct icc {};
struct clang {};
#if COMPILING_WITH_GCC
typedef compiler gcc;
#elif COMPILING_WITH_MSVC
typedef compiler msvc;
#elif ...
#endif
};
class text
{
public:
// *** construction ***
// by default expect UTF8
text(const char* cstr)
{
assert(is_utf8(cstr));
store(cstr);
}
// by default expect UTF8
text(const std::string& str)
{
assert(is_utf8(str.begin(), str.end()));
store(str);
}
// otherwise use the tag type to
// do any necessary conversions
template <typename Char, typename EncodingTag>
text(const Char* cstr, EncodingTag encoding)
{
// use an overload to convert from the encoding
// basically if the tag is textenc::winapi then use
// the winapi-supplied functions and convert to utf8
// if it's posix look at the locale and convert with the posix function
// if the tag is textenc::msvc convert the msvc literal from
// whatever crazy encoding it uses to utf8, ...etc.
convert_and_store(cstr, encoding));
}
template <typename Char, typename EncodingTag>
text(const std::basic_string<Char>& cstr, EncodingTag encoding)
{
convert_and_store(str.begin(), str.end(), encoding));
}
// *** conversion ***
// by default output in uft8
const char* c_str(void) const;
// by default in utf8 (could be a friend fn instead of member)
std::string str(void) const;
// (could be a friend fn instead of member)
template <typename EncodingTag>
std::string str(EncodingTag encoding) const
{
return convert_from(encoding);
}
// wide char string output
template <typename EncodingTag>
std::wstring wstr(EncodingTag encoding) const
{
return wconvert_from(encoding);
}
// implement whatever functionality
// making sense for utf8-encoded-text
};
// usage
text t1 = "blahblah"; // must be utf8
// whatever encoding the compiler uses for wide literals
text t2(L"blablablabl", textenc::compiler());
text t3(some_posix_function(), textenc::posix());
text t4(SomeWinapiFunc(), textenc::winapi());
text t5(SomeWinapiFuncW(), textenc::winapi());
text t6(pq_some_func(), textenc::libpq());
text t7 = concat(t1, t2, t3, t4, t5, t6);
std::ostream& out = get_outs();
out << t7; // output in utf8
text t8;
std::istream& in = get_ins();
in.read_line(t8);
text t9;
in.read(t9, 1024);
some_function_expecting_utf8(t9.c_str());
SomeWinapiFunction(t8.str(textenc::winapi()).c_str());
SomeWinapiFunctionW(concat(t9, text::newline(),
t8).wstr(textenc::winapi()).c_str());
some_posix_function(transform(concat(t4, t7,
t9)).str(textenc::posix()).c_str());
some_wrapped_os_function(str(t8, textenc::os()));
some_stdlib_function(str(head(substring_after(t9, t2), 10), textenc::stdlib()));
i.e. besides the fact that the string "uses utf8" (there is already
a whole heap of such strings) it must also handle all the conversions
between utf8 and whatever the OS and the major libraries and
APIs expect and use; conveniently (and effectively).
Otherwise the effort is IMHO wasted.
Boost libraries (at the very least those wrapping OS functionality)
should adopt this text class, and do the conversions, "just-in-time"
when making the OS API call.
My 0.02Euro
Best,
Matus
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk