Boost logo

Boost :

From: James Porter (porterj_at_[hidden])
Date: 2007-09-27 18:06:17


Perhaps I'm misunderstanding the purpose of the state_type typedef in
char_traits. It seems that it's used for two things: to specify the type
that will hold the actual shift state for encodings that require it, and to
specify a codecvt facet for the encoding in question (to read/write it
from/to a stream of bytes). The latter part is what I'm focusing on.
Appendix D of "The C++ Programming Language" said of codecvt: "The State
template argument is the type used to hold the shift state of the stream
being converted. State can also be used to identify different conversions by
specifying a specialization."

I probably should have been clearer that I was referring to the state type
and not the shift state itself. What I meant was that, if you defined a
shift state as class JISstate { ... };, you would need to specialize codecvt
to convert a Shift JIS encoding on disk to a *particular* encoding in memory
(say UTF-16). You'd need a different specialization of codecvt to convert to
UTF-8.

Hopefully this explains my position better, and I apologize if I caused
needless confusion. This may not even be the best way, but with a
converting_stream class, we could do the following:

- create a converting_ifstream with char_traits<Ch>::state_type of JISstate
- create a string with char_traits<Ch>::state_type of UTF8
- (automatically) build a codecvt facet with a state_type of
conversion_pair<JISstate,UTF8>
- the conversion_pair would take bytes encoded as Shift JIS, convert them to
a Unicode code point, and convert that to UTF-8 byte(s)
- read data from the converting_ifstream to the string
- the codecvt facet would then run the conversion from conversion_pair,
resulting in a UTF-8 encoded string from Shift JIS data on disk

This could then be extended to UTF-16 simply by creating a state_type class
for it and specifying a conversion between Unicode code points and UTF-16.

Like I said, this may not be the best way, but hopefully it at least
explains my idea better.

- James

On 9/27/07, Sebastian Redl <sebastian.redl_at_[hidden]> wrote:
>
> James Porter wrote:
> > On 9/27/07, Sebastian Redl <sebastian.redl_at_[hidden]> wrote:
> >
> >> That has nothing to do with what basic_string<wchar_t> is, though,
> >> because that state is to be used when converting the string to an
> >> external encoding.
> >>
> >
> >
> > Well, clearly that state needs to know what the internal encoding is in
> the
> > first place,
> No, why? What difference does it make to the shift state of Shift-JIS
> whether you convert to this encoding from UTF-8 or UTF-16?
>
> Sebastian Redl
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk