Boost logo

Boost :

From: Pavol Droba (droba_at_[hidden])
Date: 2004-04-06 08:56:12


On Tue, Apr 06, 2004 at 03:28:11PM +0400, Vladimir Prus wrote:
> Hi Pavol,
>
> > I have read your proposal. Maybe I'm missing something very serious,
> > but I would prefere to have a similar scheme as used by stl.
> >
> > So that, there will be variants accepting char and wchar_t data types,
> > and all possible unicode problems will be addressed by char_traits and
> > locale.
>
> Variants of what? The command line parser and config file parser will have
> two variants of the interface.

Not realy a variants. I mean templated and specialed for common char* and wchar_t*.
 
> The storage component need not have two variants. What advantage will it
> give? Finally, and that's the most important point, I believe that options
> description component need only provide two variants for the 'value'
> function. As the document say, if you have two variants of the
> options_description class, than the ascii vs. unicode decision is global
> for the entire application, which is not so good.

This argument is quite questionable. IMHO either you stick with narrow, or wide
characters in whoule application. Otherwise you are forced to make conversions
on the border lines. I don't realy see a point in the mixed type approach.

> > I understand, that stl support unicode for unicode is not the best,
> > but there are facilities, that can provide required functionality if
> > properly extended/configured.
>
> Let's break the question in two parts.
>
> 1. Should 'unicode support' mean that there are two versions of each
> interface, one using string and the other using wstring? I think this kind
> of Unicode support is not good. It means that each library which accepts or
> returns strings must ultimately have double interface and be either
> entirely in headers, or use instantinate two variants in sources -- which
> doubles the size of the library.

Actualy, in regards to a general purpose library like this, I don't think that
compile time overhead implied templatization of the code is worse then having
to do converstion all over the place in runtime. The library should work with
basic_string if possible.

If my application is unicode, and all input I have is unicode, it is realy
annoying to convert everything to and fro when interfacing to library like
program_options.

> 2. Should program_options library use UTF-8 or wstring. As I've said,
> neither is clear leader, but UTF-8 seems better.

Ferda Prant gave quite a good explanation in an other mail about the unicode
support in STL. I'm asking only for seamless integration with standard
facilities.
 
> > I think, that there is no big reason to try to reinvent a wheel and
> > provide all encopasing solution in the library like program_options.
> > It should be enough if it will be unicode-enabled so it can be used in the
> > any specific scenario, provided that all necessary facilities are on
> > place.
>
> It's *far* from all encopassing solution. In fact, the changes in
> program_options will include:
>
> 1. Adding ascii -> UTF-8 conversion in parsers
> 2. Adding UTF-8 -> ascii conversion in value parsers
> 3. Adding unicode parsers with UCS-4 -> UTF-8 conversion
> 4. Adding unicode value parsers and UTF8 -> UCS-4 conversion
>
> That's all, and given that there's at least two UTF-8 codecs announced on
> the mailing list, not a lof of work. And this will add Unicode support
> without changing interface a bit.

Your proposal does not handle the problem. It merely workarounds it. Instead
of working with character encodings, it does a conversion all over the place.

Regards,

Pavol


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk