Boost logo

Boost :

From: Ronald Garcia (garcia_at_[hidden])
Date: 2001-11-01 01:11:17


Hi Mark,

Apologies for the delay on this reply. I've been quite busy.

>>>>> "MA" == Mark D Anderson <mda_at_[hidden]> writes:

    MA> is there any relation between your work and vladimir prus, who
    MA> uploaded some codecvt code about a month ago?
    MA> http://groups.yahoo.com/group/boost/message/17772

I have taken a look at the above message and the code that it refers
to. I can't quite grasp what the code is doing,
but according to descriptions it appears to provide two codecvt
facets: one converting from a utf-8 external (file) representation to
ucs2 internally (memory), and back, while the other converts from ucs2
externally to utf8 internally. I may be wrong and so the author may
wish to correct me here.

    MA> is there a reason not to introduce a fixed typedef
    MA> boost::ucs4_t, as a uint32_t? then there could be a version
    MA> of this that would work on any platform. as you know, on
    MA> win32 (and elsewhere?) wchar_t is 16bits, so you are currently
    MA> forcing platform-specific specialization.

I chose to implement the facet as a template to avoid making solid
decisions about the types used to represent utf-8 elements and
ucs-4 elements. It makes sense that compilers with large enough
wchar_t should use std::codecvt<wchar_t,char,std::mbstate_t>,
wofstream, and wifstream for file streaming, but you
are correct that for windows one would have to provide
specializations. I'm pretty new to this area of the C++ library and
so I'm trying to get a feel for what works best.

    MA> even on systems where wchar_t is 32bits, there are no
    MA> guarantees that the implementation character set is unicode.
    MA> even if __STDC_ISO_10646__ is defined, i'm not sure if that
    MA> strictly guarantees that the values are comparable with cast
    MA> ints, because it (i think) is still implementation defined
    MA> what the signedness and endianness is of wchar_t storage, even
    MA> if the code value space is unicode.

I'm not sure what you are referring to here. Could you run that by me
again?

    MA> can you make a version of this which conforms to
    MA> codecvt_byname?

I'll have to look into this.

    MA> can you somehow integrate your iterator and codecvt, so that
    MA> one is implemented in terms of the other? i think there is
    MA> still a use for the iterator model, for when you want to keep
    MA> your buffer in the external encoding.

This should be possible. I evolved the iterator adaptor code into the
codecvt facet, and I believe that components of the code could be
extracted. I think that in most situations, converting on reads into
memory and writes out from memory suffice (I am under the impression
that this was the rationale behind codecvt's design), but there may
still be use for an iterator adaptor.

ron


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk