Boost logo

Boost :

From: James Porter (porterj_at_[hidden])
Date: 2007-10-16 20:35:48


I've been thinking about this off and on as well, though have been a
little too busy to give it the write-up it deserves. That said, I think
your code is a pretty good start. While I agree that tagged strings
shouldn't automatically convert on assignment, I think recode() isn't
the most useful way to go about it.

In practice, I expect that most code conversion would occur during I/O,
so I'd prefer to see the conversion done by the stream itself. recode()
could still exist as a convenience function, though.

On the subject of converting between different encodings of strings, I
noticed that you had some concerns about assignment between two
different encodings using the same underlying type (latin1_string s =
utf8_string("foo") for example). This could be resolved by using a
nominally different char_traits class when inheriting from basic_string.
However, this would cause problems with I/O streams, since they expect a
particular character type and char_traits. This goes back to my point
above: the I/O streams should be aware of string tagging (if not
directly responsible for it).

I'll need to think about how to specify character sets so that they're
usable at compile time and run time, though my instinct would be to use
subclasses that can be stored in a map of some sort. The subclassing
would handle compile-time tagging, and the map would handle run-time
tagging:

   class utf8 : public charset_base { ... };
   charset_map["utf8"] = new utf8();

   ...

   tagged_string<utf8> foo;
   rt_tagged_string bar;
   bar.set_encoding("utf8");

This should combine the benefits of your first and third choices (type
tags and objects), though I haven't thought about this enough to be
confident that it's the right way to go. If I get the chance, I'll try
to come up with a proof of concept for my ideas, though I'm in the
middle of some other things right now.

- James

Phil Endecott wrote:
> Dear All,
>
> After a rather longer delay than I had planned, I have some
> proof-of-concept code for strings tagged with character sets. You
> might like to first look at the example usage, here:
>
> http://svn.chezphil.org/libpbe/trunk/examples/charsets.cc
>
> Note that this file is written using UTF8, but the web server seems to
> be declaring it to be latin1....
>
> The actual implementation is here:
>
> http://svn.chezphil.org/libpbe/trunk/include/charset.hh
>
> This is far from complete, but it does have some useful functionality;
> mainly I have been using it to work out what is possible.
>
> Your comments would be very much appreciated.
>
>
> Regards,
>
> Phil.
>
>
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk