Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-08-12 05:00:36

Next message: Yakov Galka: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Previous message: John Maddock: "Re: [boost] [proto] RValue reference support?"
In reply to: Daniel James: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Next in thread: Yakov Galka: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Yakov Galka: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"

On Fri, Aug 12, 2011 at 9:57 AM, Daniel James <dnljms_at_[hidden]> wrote:
> On 11 August 2011 12:57, Artyom Beilis <artyomtnk_at_[hidden]> wrote:
>>
>>> There's a lot of existing code which is not based on that assumption -
>>> we can't just wish it out of existence and boost should be compatible
>>> with it.
>>
>> Then cross platform, Unicode aware programming will always
>> (I'm sorry) suck with Boost :-)
>>
>>
>> Thats it...
>
> Unless a different solution can be found.

I see the old flam .. er discussion on text handling is back :)

>From the previous debate(s) I now accept that it would
be a bad idea just to force the encoding of std::string to be utf8,
So a (nearly) ideal text handling class should IMO look like this
(see usage below):

// text encoding tag types for conversion function dispatching
namespace /*or struct */ textenc
{
  struct utf8 {};
  struct utf16 {};
  struct utf32 {};
  struct winapi {};
  struct posix {};
  struct stdlib {};
  struct sqlite {};
  struct libpq {};
  ...
  struct libxyz {};

#if WE_ARE_ON_WINDOWS
typedef winapi os;
#elif WE_ARE_ON_POSIX
typedef posix os;
#elif ...
#endif

  struct gcc {};
  struct msvc {};
  struct icc {};
  struct clang {};

#if COMPILING_WITH_GCC
typedef compiler gcc;
#elif COMPILING_WITH_MSVC
typedef compiler msvc;
#elif ...
#endif
};

class text
{
public:
  // *** construction ***
  // by default expect UTF8
  text(const char* cstr)
  {
     assert(is_utf8(cstr));
     store(cstr);
  }

  // by default expect UTF8
  text(const std::string& str)
  {
     assert(is_utf8(str.begin(), str.end()));
     store(str);
  }

  // otherwise use the tag type to
  // do any necessary conversions
  template <typename Char, typename EncodingTag>
  text(const Char* cstr, EncodingTag encoding)
  {
     // use an overload to convert from the encoding
     // basically if the tag is textenc::winapi then use
     // the winapi-supplied functions and convert to utf8
     // if it's posix look at the locale and convert with the posix function
     // if the tag is textenc::msvc convert the msvc literal from
     // whatever crazy encoding it uses to utf8, ...etc.
     convert_and_store(cstr, encoding));
  }

  template <typename Char, typename EncodingTag>
  text(const std::basic_string<Char>& cstr, EncodingTag encoding)
  {
     convert_and_store(str.begin(), str.end(), encoding));
  }

  // *** conversion ***
  // by default output in uft8
  const char* c_str(void) const;

// by default in utf8 (could be a friend fn instead of member)
std::string str(void) const;

  // (could be a friend fn instead of member)
  template <typename EncodingTag>
  std::string str(EncodingTag encoding) const
  {
     return convert_from(encoding);
  }

  // wide char string output
  template <typename EncodingTag>
  std::wstring wstr(EncodingTag encoding) const
  {
     return wconvert_from(encoding);
  }

// implement whatever functionality
// making sense for utf8-encoded-text
};

// usage

text t1 = "blahblah"; // must be utf8

// whatever encoding the compiler uses for wide literals
text t2(L"blablablabl", textenc::compiler());

text t3(some_posix_function(), textenc::posix());

text t4(SomeWinapiFunc(), textenc::winapi());
text t5(SomeWinapiFuncW(), textenc::winapi());

text t6(pq_some_func(), textenc::libpq());

text t7 = concat(t1, t2, t3, t4, t5, t6);

std::ostream& out = get_outs();
out << t7; // output in utf8

text t8;
std::istream& in = get_ins();
in.read_line(t8);

text t9;
in.read(t9, 1024);

some_function_expecting_utf8(t9.c_str());

SomeWinapiFunction(t8.str(textenc::winapi()).c_str());
SomeWinapiFunctionW(concat(t9, text::newline(),
t8).wstr(textenc::winapi()).c_str());

some_posix_function(transform(concat(t4, t7,
t9)).str(textenc::posix()).c_str());

some_wrapped_os_function(str(t8, textenc::os()));

some_stdlib_function(str(head(substring_after(t9, t2), 10), textenc::stdlib()));

i.e. besides the fact that the string "uses utf8" (there is already
a whole heap of such strings) it must also handle all the conversions
between utf8 and whatever the OS and the major libraries and
APIs expect and use; conveniently (and effectively).
Otherwise the effort is IMHO wasted.

Boost libraries (at the very least those wrapping OS functionality)
should adopt this text class, and do the conversions, "just-in-time"
when making the OS API call.

My 0.02Euro

Best,

Matus

Next message: Yakov Galka: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Previous message: John Maddock: "Re: [boost] [proto] RValue reference support?"
In reply to: Daniel James: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Next in thread: Yakov Galka: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Yakov Galka: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk