Boost logo

Boost :

From: James Porter (porterj_at_[hidden])
Date: 2007-09-23 20:11:23


Instead of defining character types per character set, you could use a
specialized char_traits class. It contains state_type, which is used
with codecvt from the I/O stream library. The default typedef for char
and wchar_t is mbstate_t, which appears in the standard specializations
for codecvt. (codecvt is used to perform code conversion between
character types; it's used in wfstream to convert a stream of chars on
disk to wchar_ts in memory.)

If you change state_type in the char_traits, you'd be able to
differentiate the various basic_string types and include information
about the character encoding without writing a whole lot of new code.

To be honest, I'm only just beginning to look into this myself, so I'm
afraid I don't have a whole lot of information to give you, but I do
think this would be the simplest way to handle this part of your project.

- James

Phil Endecott wrote:
[snip]
> If latin1string has a constructor from std::string (which is its own
> base type) that's fine, i.e. we can still write:
>
> latin1string s2 = s1.substr(1,5);
>
> but unfortunately we can also write
>
> latin2string s3 = s1.substr(1,5);
>
> which is not so good.
>
> So a different approach is to define a set of character-set-specific
> character types, and build string types from them:
>
> typedef char8_t latin1char;
> typedef char8_t latin2char;
[/snip]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk