Boost logo

Boost :

From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2007-07-02 12:31:35


Sebastian Redl <sebastian.redl_at_[hidden]> writes:

> Jeremy Maitin-Shepard wrote:
>> Okay, that is a good point. "Data stream" would probably be best then.
>> I am quite averse to "binary stream", since it would really be a misuse,
>> albeit a common misuse, of "binary".
>>
> I'm using "unstructured stream" in the next iteration of the design
> document. Does that seem appropriate to you?

I suppose it depends on how text streams will differ from these
"unstructured" streams. It seems that it may be the case that a single
"unstructured" stream concept/interface will be defined, and text
streams will be instances/implementations of this concept, but in
addition provide other functionality. In that case, maybe just the name
"stream" would indeed be appropriate.

>> I see. You are suggesting, I suppose, that in addition to providing
>> formatting of individual values, the binary formatting layer also
>> provides stream filters for converting a stream of one type into a
>> stream of another type with a particular formatting. I like this idea.
>>
> Yes, exactly. This can be very useful for reading of data. However, it's
> not quite sufficient for runtime selection of a character encoding. For
> this, an interface that is not a stream of a single data type, but
> rather provides extraction of any data type at any time, is required.

Not necessarily, since the data formatting stream could still be created
as needed by the encoding conversion code.

Nonetheless, for direct use (e.g. for reading a complicated data format
like an image format with fields of various sizes), a single filter that
supports reading/writing any supported type in any supported format
would likely be useful. It is not clear whether it would also be useful
to provide additional filters for which certain information about the
format (i.e. endianness) is specified in the type of the filter, rather
than in the names of/template arguments to individual methods.

[snip: example of code for encoding conversion]

Perhaps a better default would be to assume the native byte order.

It seems a bit unfortunate to have 3 different interfaces for encoding
conversion from a file. I think that it would be likely be ideal to
unify these into a single interface somehow.

>>> Yes, a text stream is essentially a binary stream with an encoding
>>> instead of a data type. So the interface is the same in description, but
>>> the types involved are different. I think this is mostly a documentation
>>> issue.
>>>
>>
>> Instead of a data type? But presumably both the data type and the
>> encoding must be specified. Also, it seems like it may be useful to be
>> able to specify the encoding at run-time, rather than just
>> compile-time.
>>
> Instead of a data type. The data type of a text_stream<Encoding> is
> base_type<Encoding>::type. This is for internal use - the external use
> is different anyway.

Conceptually, it seemed that there might be advantages to considering
streams of characters encoded in a non-Unicode encoding as text streams
as well. Then there could be a somewhat uniform interface for all
encoding conversions, since they would always convert a text stream to a
text stream.

There also seem to be advantages to this interface, though.

>> Which encodings will be supported at
>> compile-time, then? Just UTF-8, UTF-16, and UTF-32?
>>

> Whichever the library supplies. I think these three plus ASCII and
> Latin-1 would make a reasonable minimum requirement.

Would ASCII really be useful since UTF-8 would be supported?

Why is Latin-1 included? What is the argument for supporting iso-8859-1
but not the other iso-8859 encodings? Furthermore, what is the argument
for not supporting any other arbitrary encoding?

I don't think it is reasonable to give Latin-1 special status.

-- 
Jeremy Maitin-Shepard

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk