Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Soares Chen Ruo Fei (crf_at_[hidden])
Date: 2011-08-10 19:29:04


Phil Endecott wrote:
> No, because:
>
>> The immutability
>> of the string adapter is actually achieved by holding a smart pointer
>> to the const version of the raw string.
>
> If you were just wrapping an existing string class, you wouldn't do that;
> you'd just wrap the existing string class.  By adding this extra bit, you're
> making a string that is immutable, copy-on-write and reference counted -
> whether or not the underlying string is or not.

I wouldn't argue with you about the definition of new string class but
as long as you understand the design goal then it's fine.

For me I'd say that the unicode_string_adapter class is more like a
glorified smart pointer, as you can assume the class to have the
following signature with identical functionality:

<typename StringT>
class unicode_string_adapter : public std::shared_ptr<const StringT>;

and I'm just using composition over inheritance because that brings
better organization to the code.

Also the class does not exactly do the same copy-on-write as
std::string used to. It *always* copy when the edit() method is called
regardless of whether it has one or many reference count. So there is
no nasty overhead of making sure to have only one reference count
during mutation.

> Whatever.  The point is that you have this operator* and operator-> overload
> whose purpose is non-obvious to someone looking at code that uses it.  What
> is your rationale for doing that, rather than providing e.g. an impl() or
> base() or similar accessor?  Can you give examples of any precedents for
> this usage?  What names or syntax do other wrapper/adaptor/facade
> implementations use?

I'd say the purpose of operator *() is pretty obvious: to retain
backward compatibility with the original raw string class. One of the
biggest obstacle of creating new string class is that it will break
compatibility with legacy library APIs that accept std::string in the
function parameter. My goal is to make it as easy as possible for
users of Boost.Ustr to get back their original raw string at any time
when needed, so that it is less painful in migration.

Ultimately a developer should can use
`unicode_string_adapter<std::string>` with only his existing knowledge
on std::string. The developer does not need to learn Boost.Ustr at all
if he does not care about the encoding and content of the string, and
all he have to do in his code to migrate to Boost.Ustr is just to
replace all str.string_method() to str->string_method(), and
existing_function(str) to existing_function(*str). As a result, the
syntax makes it extremely easy to migrate with minimal changes.

There is already a member function that does the actual
implementation, which is str.to_string(). So it will be just be a
matter of deleting three lines of code to remove operator *() anyway.
But if you look at unicode_string_adapter itself as a smart pointer to
the raw string, then operator *() would make more sense.

> Well I don't really care who does it, but I think we should have these UTF
> encoding and decoding functions somewhere in Boost that is not an
> implementation detail of some other library.

I'd agree with you that Boost needs a complete toolset of Unicode
library. But since that is out of my project scope I'll leave it to
others to answer this question.

> OK, it's not for me, that's a shame.  Maybe if you're lucky someone who DOES
> want this functionality will now post a reply to your request for
> comments...

Yup.. Basically whatever raw string processing algorithm that cannot
work well enough with the standard std::string implementation should
not be made to work with Boost.Ustr as well. Actually I don't think
there is any general purpose string class that can does the job you
want as well.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk