Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-04-06 09:29:54


Pavol Droba wrote:

>> Variants of what? The command line parser and config file parser will
>> have two variants of the interface.
>
> Not realy a variants. I mean templated and specialed for common char* and
> wchar_t*.

Ok.

>> The storage component need not have two variants. What advantage will it
>> give? Finally, and that's the most important point, I believe that
>> options description component need only provide two variants for the
>> 'value' function. As the document say, if you have two variants of the
>> options_description class, than the ascii vs. unicode decision is global
>> for the entire application, which is not so good.
>
> This argument is quite questionable. IMHO either you stick with narrow, or
> wide characters in whoule application. Otherwise you are forced to make
> conversions on the border lines. I don't realy see a point in the mixed
> type approach.

Ok, let me rephrase. You're writing boost::http_proxy library and want to
make it customizable via program_options. So you need to provide function
'get_options_descriptions'. What will the function return? If there's only
one options_descriptions class, there's no question. If there are two
versions, then which one do you return? No matter what you decide, the main
application might need to do conversions just because it either needs
unicode or does not need it.

And why an existing operator>> which works for istream only should be fixed
to support wistream, if some other option need unicode support?

>> > I understand, that stl support unicode for unicode is not the best,
>> > but there are facilities, that can provide required functionality if
>> > properly extended/configured.
>>
>> Let's break the question in two parts.
>>
>> 1. Should 'unicode support' mean that there are two versions of each
>> interface, one using string and the other using wstring? I think this
>> kind of Unicode support is not good. It means that each library which
>> accepts or returns strings must ultimately have double interface and be
>> either entirely in headers, or use instantinate two variants in sources
>> -- which doubles the size of the library.
>
> Actualy, in regards to a general purpose library like this, I don't think
> that compile time overhead implied templatization of the code is worse
> then having to do converstion all over the place in runtime. The library
> should work with basic_string if possible.

I generally tend to ignore speed issues, since with linear time algorithsm
and contemporary processors it's not likely to be important. OTOH, code
size *is* important. I've just compiled one of the library example, with
static linking and full optimization. It takes 152K.

Probably, it's partly gcc fault, or maybe it can be reduced but now it's so.
Empty program takes several K. Now, if I tell anyone "here's a good library
for parsing command line but it will add 152K to the application size", the
someone will tell "thanks, I'll parse command line by hand".

However, is the library is shared and is available on every Linux
installation, then the code size is not issue.

> If my application is unicode, and all input I have is unicode, it is realy
> annoying to convert everything to and fro when interfacing to library like
> program_options.

You don't have to convert anything. Parsers will accept wstring and for
values where you need unicode you'll use wstring as well.

>> It's *far* from all encopassing solution. In fact, the changes in
>> program_options will include:
>>
>> 1. Adding ascii -> UTF-8 conversion in parsers
>> 2. Adding UTF-8 -> ascii conversion in value parsers
>> 3. Adding unicode parsers with UCS-4 -> UTF-8 conversion
>> 4. Adding unicode value parsers and UTF8 -> UCS-4 conversion
>>
>> That's all, and given that there's at least two UTF-8 codecs announced on
>> the mailing list, not a lof of work. And this will add Unicode support
>> without changing interface a bit.
>
> Your proposal does not handle the problem. It merely workarounds it.
> Instead of working with character encodings, it does a conversion all over
> the place.

Some of the conversions are unavoidable. E.g. if you have unicode-enabled
library, you'd still need to accept ascii input (because you can't expect
that all input sources are unicode -- main in Linux is never unicode).

If you want to support legacy operator>> you'd need conversion to ascii.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk