Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-26 12:37:32

On Thu, Jan 27, 2011 at 1:05 AM, Sebastian Redl
<sebastian.redl_at_[hidden]> wrote:
> On 26.01.2011, at 17:06, Dean Michael Berris wrote:
>> In one of the previous messages I laid out an algorithm template like so:
>>  template <class String>
>>  void foo(String s) {
>>    view<encoding> encoded(s);
>>    // deal with encoded from here on out
>>  }
> I don't see how such an algorithm implementation is technically feasible. If the type substituted for String doesn't intrinsically know what its encoding is, how would view<encoding> know how to present the data in the requested encoding? How would it know how to transcode?

Here's where it gets a little tricky.

If what you substitute for String is the hypothetical `boost::string`
then what happens is the view will interpret it as raw data underlying
the view.

If you're substituting a view<some_encoding> for String then what
happens is the internal view<encoding> construction will hold a copy
of the (immutable) view<some_encoding>, and upon access to the
iterators would do the transcoding on the fly.

Note that validation could be implemented as an algorithm external
(and unique) to the encoding being presented.

> For that matter, why would foo's implementer care at all about the encoding? I cannot really think of any algorithms (save transcoding algorithms themselves) that would care about the actual encoding. What they typically want is the sequence of code points or more likely characters that the string represents. But if the string doesn't know what encoding its internal data is in, the algorithm cannot get the code points without someone telling it what the encoding is. By making the string oblivious of the data's actual encoding, you put the burden of that on the user of the string class, who now has to supply every single algorithm that wants to do something with the string beyond looking at its raw data with the actual encoding of the string.

Right. In the design I have in my head, it's really split into two:

The underlying immutable string type and the view that wraps these
immutable strings and applies the "encoding" appropriately as part of
the view's implementation.

So if a user wanted to specify that a given thunk of data in memory is
supposed to be viewed as UTF8, he would do something like this:

  boost::string s = "The quick brown fox with unicode characters";
  boost::strings::view<boost::strings::utf8_encoding> encoded(s);

So the interface to boost::string and for the view<...> will be the
same -- expose iterators mostly -- and you'd pretty much be able to
deal with either one like they were practically the same thing. Except
of course those that are views largely expose a different type for the
dereferenced iterator based on the specific encoding which you want to
view the data in. If you wanted raw access to the bytes then you deal
with the iterator from the "raw string" directly.

> Unless I completely misunderstand what you want, of course.

I can't say for sure, but I think you missed the part where the view
offered an encoded view while the string just basically is an
immutable collection of bytes. :)


Dean Michael Berris

Boost list run by bdawes at, gregod at, cpdaniel at, john at