Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-10 12:51:53


Soares Chen Ruo Fei wrote:
> Hi Phil,
>
> On Aug 9, 2011, Phil Endecott wrote:
>> I think there are probably as many ways to implement a "better" string as
>> there are potential users, and previous long discussions here have
>> considered those possibilities at great length. ?In summary your proposal is
>> for a string that is:
>>
>> - Immutable.
>> - Reference counted.
>> - Iterated by default over unicode code points.
>
> I think you misunderstood my point.

No, I believe I understand what you are doing.

> Boost.Ustr does not attempt to
> redesign another string class to begin with. Instead it wraps existing
> string class that is provided through the template parameter and rely
> on that string class for actual container operations.

No, because:

> The immutability
> of the string adapter is actually achieved by holding a smart pointer
> to the const version of the raw string.

If you were just wrapping an existing string class, you wouldn't do
that; you'd just wrap the existing string class. By adding this extra
bit, you're making a string that is immutable, copy-on-write and
reference counted - whether or not the underlying string is or not.

>> - Provides access to the code units via operator* and operator->, i.e.
>> ? ?s.begin() ?// Returns a code point iterator.
>> ? ?s->begin() // Returns a code unit iterator.
>>
>> I won't comment about the merits or otherwise of those points, apart from
>> the last, where I'll note that it is not to my taste. ?It looks like it's
>> "over clever". ?Imagine that I wrote some code using your library, and then
>> a colleague who was not familiar with it had to look at it later. ?Would
>> they have any idea about the difference between those two cases? ?No, not
>> unless I added a comment every time I used it. ?Please let's have an obvious
>> syntax like:
>>
>> ? ?s.begin() ? ? ? // Code points.
>> ? ?s.impl.begin() ?// Code units.
>> ?or s.units_begin() // Code units.
>
> The actual intention of operator ->() is not actually to provide
> access to code unit iterator, instead it is used for programmers to
> access some raw string functionalities that unicode_string_adapter is
> not able to provide.

Whatever. The point is that you have this operator* and operator->
overload whose purpose is non-obvious to someone looking at code that
uses it. What is your rationale for doing that, rather than providing
e.g. an impl() or base() or similar accessor? Can you give examples of
any precedents for this usage? What names or syntax do other
wrapper/adaptor/facade implementations use?

>> Your library does have [raw UTF encoding and decoding functions]
>> , but it is hidden in an implementation detail. ?Please can you
>> consider bringing out your core UTF encoding and decoding functions to the
>> public interface?
>
> My encoder/decoder functions are actually quite similar to Mathias'
> implementation. (in fact I referred to his design before implementing
> my own) However these function interfaces are specifically designed to
> fit the internal usage of Boost.Ustr, albeit I made them generic
> enough. The reason I did not directly use/copy Mathias' implementation
> is because the interfaces are slightly different and I wanted to avoid
> obscured bugs, and because the algorithm is simple enough to
> re-implement, and also because I wanted to take this chance to learn
> the encoding algorithms (and I did learn something). :) But I'd agree
> that it shouldn't be hard to refactor the encoders and marge with
> Mathias' implementation when the time comes.
>
> Currently I do not have plan to make iterator adapters on top of these
> encoding/decoding functions, and I think it is also a bit redundant as
> Mathias has already gone through the mess of generating these
> functions using macros and template metaprogramming. ;)

Well I don't really care who does it, but I think we should have these
UTF encoding and decoding functions somewhere in Boost that is not an
implementation detail of some other library.

>> I would also like to see some benchmarks for the core UTF conversion
>> functions. ?If you post some benchmarks that decouple the UTF conversion
>> from the rest of the string class, I will compare the performance with my
>> own code.
>
> At this time I am focusing on design issues rather than optimizations,
> so I didn't think much about benchmarks. I'd guess that the
> encoding/decoding speed is probably inferior to other encoder/decoder
> functions. You can see in my implementation that I did not use
> obscured hacks that can shorten the code while mathematically remain
> the same. Instead I focused on readability first so that even amateurs
> can read the code and easily learn how the encoding/decoding process
> works. So if you are writing performance critical application that
> encode/decode huge amount of Unicode text, I'd say that Boost.Ustr is
> probably not for you (yet).

OK, it's not for me, that's a shame. Maybe if you're lucky someone who
DOES want this functionality will now post a reply to your request for comments...

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk