Boost logo

Boost :

Subject: Re: [boost] GSoC Proposal Preparation For Encoding Awared String
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-03-18 12:01:58


> From: Soares Chen <crf_at_[hidden]>
>
> Hi all,
>
> [snip]
>
> I think there are several options that I can choose for my project:
> 1. To use Chad Nelson's code as base, try to incorporate other ideas
> proposed in the mailing list, integrate with Boost.Locale, and make it
> Boost quality to submit for review. If this option is chosen, I wish
> that Chad Nelson can be my mentor.
> 2. To start a new code base, gather and compile ideas suggested in
> mailing list, final design decisions made by me and my mentor but not
> the community (to keep the project going on fast), make it Boost
> quality and submit for review.
> 3. To start the boost::string project, where another better string is
> reinvented and fix all the weaknesses of std::string.
> 4. Adopt different proposal, and improve on existing project such as
> Boost.Unicode [2] or Boost.Locale [3] such that it really solves the
> encoding awareness problem.
> 5. Any other suggestion?

Hello,

I want you to address several points:

It would be very hard to get the consensus about the way to solve the
problem.

Probably the best and the most wishful thinking solution is to assume
that all strings are UTF-8 based, however it is not the reality.

The problem is actually not the string but rather the way you code.

Even if you create a perfect UTF-8 string and then call

    fopen(your_perfect_string.c_str(),"r")

Under windows... And it would not work <sigh... damn Windows>

As you can see from multiple discussions, there are many
contradicting requirements about how should string look
like and what should it bring with.

If you want to provide better Unicode awareness to Boost you
don't need new cool utf-XYZ string, you need a policy.

I think boost::filesystem v3 is a big step forward, it allows you
to use UTF-8 strings on Windows which I think is a really good
beginning.

This is my opinion.

Boost.Locale and several other my projects (CppCMS, CppDB) live happily
with std::string.

The problem is that in vast majority of cases you don't need encoding aware
string, as so many operations you usually do on strings are encoding
agnostic. But this is other story.

Bottom line, if you want to improve Unicode awareness of Boost
I think you need to adopt Boost.Filesystem v3 like policy
all over the code base of Boost.

1. Use Wide API as native one in Boost everywhere under Windows
2. Use char * API as native one in Boost everywhere under non-Windows platforms
3. Use std::codecvt to handle this (after many tricks... )

The Unicode String/Encoding Aware String is the last thing to do
not the first thing.

Why?

1. Because you will never get the consensus about what is the "right-thing"
   to do (wide, narrow, utf-8, utf-16) etc.

   Project that are handled and directed by a single source or management
   like Qt, GTK(mm), Java, C#, Python or others may decide what is the
   right thing.

   This will never happen in Boost as it is too pluralistic even in cases
   where it does not always make sense, just because the way libraries
   are developed, reviewed and got in - based on public reviews
   that eventually encourages diversity.

2. Because you would not likely to be able to enforce users to actually
   use your string. As boost is more about collaboration then enforcement
   of specific style.

3. Even heavy discussions there hadn't got to any conclusion. So what would
   happen and final review of your library?

My $0.02

Artyom

      


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk