Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Soares Chen Ruo Fei (crf_at_[hidden])
Date: 2011-08-12 14:48:52


On Thu, Aug 11, 2011 Yakov Galka wrote:
> As per the design of the proposed library:
> It mixes two orthogonal concepts, namely encoding and storage. The two shall
> be separate.
> I don't like reference counted strings. Passing strings by reference is not
> that hard. Moreover, lots of atomic memory-bus locks in a multiprocessor
> system degrade performance.

If you don't like smart pointer of strings, I have just created an
alternative string traits and you can look at it in the detail folder.
The alt_string_traits defines all smart pointer types as the original
string type, effectively making unicode_string_adapter directly hold
the string object as it's member object.

I hope that you'd agree with me that the pattern:

class unicode_string {
  public:
    // decorated methods here
    ...
  private:
    std::string _str;
};

is effectively the same as:

class unicode_string : public std::string {
  public:
    // decorated methods here
    ...
};

and a unicode class like that is the extension of the original string.

> The 'unicode' support (codepoint iteration, etc) is purely algorithmic and
> thus shall be independent of the way the data is stored. I wold like to see
> something like `codepoints(any_char_iterator_range)` returning a range of
> codepoints.

Due to popular demand I've added the static method
unicode_string_adapter::make_codepoint_iterator() to satisfy your
request. It takes three arguments: current, begin, and end to
transverse in both direction without going out of bound.

I am sorry that I don't understand why is this request so insisted
when the same functionality is already exist in Boost.Unicode and
other Unicode libraries. But anyway here is it.

On Thu, Aug 11, 2011 Phil Endecott wrote:
> Soares Chen Ruo Fei wrote:
>>
>> you can assume the class to have the
>> following signature with identical functionality:
>>
>> <typename StringT>
>> class unicode_string_adapter : public std::shared_ptr<const StringT>;
>
> What is your rationale for that?  What precedent is there for a
> wrapper/adapter/facade that behaves like a pointer to the wrapped object?
>  I'm not aware of any precedents; this is a new pattern to me.  Why have you
> chosen to do this, rather than providing an accessor member like impl()?

Let's just say it's purely syntactic taste that I think can make ease
of the transition without too much confusion. But since the method is
just consist of a few lines of code why don't we just have a
democratic vote on whether to keep operator *()? I don't know, maybe
let's say if six or more people here vote for no out of a maximum
number of ten people, then I'll delete that few lines and make
everyone happy. :)

Actually I can't find any precedents that use this pattern as well, so
I'm not sure whether it is a new pattern. The basis of this pattern is
simple: use shared_ptr to store const version of the object so that it
can be shared quickly, and clone the object for modification and store
the cloned mutable object in something like unique_ptr to disable
copying and sharing, and if possible then find a way to disable access
of const methods in this mutable object to prevent user from reading
and writing at the same time. I won't say this pattern is flawless but
just because nobody used it before doesn't mean it is bad by default.
It'll need some practical use of this pattern to see whether this
pattern can lead to better programming construct and fewer bugs
especially for amateur developers who are new in C++.

On Fri, Aug 12, 2011 at 8:19 PM, Daniel James <dnljms_at_[hidden]> wrote:
> I'm not a Windows expert, but I needed to do this for quickbook, I
> wasn't able to find a complete solution, but what I've got sort of
> works. Maybe someone else knows better. I've a horrible feeling that
> someone is going to point out a much simpler solution that makes what
> I do look silly and pointless.
>
> [...]
>
> Annoyingly _O_U16TEXT was largely undocumented until recently, I don't
> know if it was available before Visual Studio 2008. The last time I
> checked, it wasn't available in mingw. Here's the MSDN page for
> _setmode:
>
> http://msdn.microsoft.com/en-us/library/tw4k6df8%28v=VS.100%29.aspx
>
> The '_isatty(_fileno(stdout))' checks that you are writing to the
> console. You don't want to write UTF-16 when output is piped into a
> program that expects 8 bit characters.
>
> A better solution might be to use the UTF-8 code page for output, but
> that didn't seem to work, at least not on XP.
>
> Finally, remember to make sure your console is using a font that can
> display the characters you're outputting.

Thanks a lot for pointing to the obscured techniques! Now that I have
a clue I can find out more on it through Google.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk