Boost logo

Boost Users :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-07-06 01:33:42


Eddie Diener wrote:

>> Now, I have a number of library packages. Say no application uses
>> wide char interface at the moment, so only narrow libraries are
>> installed, Now, a single application decides to use wide interface,
>> so its package now depends on wide libraries. As the result, I have
>> to install, in addition to some narrow libraries, their wide
>> equivalents.
>
> You only have to install the appropriate wide equivalents. There is
> nothing to say that a wide character application uses all wide character
> libraries.

But a few wide applications can span a lot of libraries.

>> After enough applications decide to use Unicode, most
>> libraries will have to be installed in two flavours.
>
> How is this worse than having a single version which has both wide and
> narrow character equivalents ? You are not saving anything in this latter
> way, and you are definitely worse than if the libraries were separate and
> you only used one or the other versions in your applications.

I never talked about "single version which has both wide and
narrow character equivalents". What I'm after is a single version which
supports both wide and narrow interface/operations. But internally, it's
just one code.

>> I don't think this is very likely for new character types to appear.
>
> I do. I would be very surprised if C++ does not adapt new character types
> in the years to come. Do you really think that if the programming world
> settles on other standard character representations that C++ will
> adamantly ignore it ?

Isn't there one representation already?

> Even now a number of programmers would like to see
> C++ support one of the Unicode standards natively, most likely UTF-32.

If you recall the Unicode discussions we had on the main list (hmm.... we
probably should have this discussion there as well), there were two
problems:
1. wchar_t is 16 bit on some platforms
2. even if it's 32 bit, wchar_t represents only codepoint, and complete
character with all the accents and other marks might take several
codepoints.

The second problem is actually most serious, and I'm really not sure that
the right solution would be yet another character type.

>> I'm actually worried that when using templates in a straight-forward
>> way, all libraries will have to some in two variants or be twice
>> larger, which is bad because of:
>
> No. There is nothing saying that a library must support more than one
> character type. But if it does, isolating each character type in its own
> header files

I don't understand this. For templated implementation, you sure can't have
wide and narrow version in different headers.

> and libraries is the right design.

>> - code size reasons,
>> - configurations reason (just one more configuration variant to worry
>> about)
>> - interoperability/convenience? (what if I use unicode paths and want
>> to pass narrow string to one of the operators?)
>
> None of your reasons holds much weight. Code size wouldn't be affected
> since each implementation is in its own library.

Only if you don't buy my argument about system-wide code size.

> There is nothing to
> configure since character types are part of the C++ standard.

And? You still need to build two library variants, test them separately,
make two packages. Current Boost build process creates a huge number of
library variants (debug/release, MT/ST, stldebug ...). Is there a need to
double that number for libraries which might need unicode?

> If you need
> to pass a unicode path to a narrow string operator, you the programmer are
> either doing something wrong or, if there is a valid conersion, you can
> make it yourself ( like wcstombs ).

If I have basic_path<char> and want to convert it into basic_path<wchar_t>,
do I really have to use mbstowcs? So, I need to iterate over all elements
of a path, calling that function, and creating the path? Sorry, there
should be a simpler way. And that simpler way is converting constructor.

>> With a bit of additional design, it's possible to make library use one
>> representation internally, and have either non-templated interface,
>> or a tiny templated facade. E.g:
>>
>> boost::path p;
>> p = p / L"foo" / "bar";
>>
>> does not seem all that bad thing for me.
>
> It is possible to do that if you can convert all character types into your
> internal representation. Even here I am paying for conversionsa back and
> forth I may not need.

If you want to append narrow path element to a unicode string, you *need* to
convert. Besides, I'm not all that sure this conversion is performance
bottleneck, given that boost::path need to use OS services. A single 'stat'
that fs::exists does might make performance of conversion non-important.

> I therefore would prefer separate templated
> libraries. Why make headaches for oneself ?

The templated library is much bigger headache that it seems. Unless you're
willing to put template code in header (which is bad for big libraries, and
is really bad for boost::fs which has to include system headers), you need:

- declare templates in public headers
- define templates in private headers/sources
- explicitly instantiate the templates for char and wchar_t.

Not so nice.

> I am always in favor of
> designs which are clear and understandable over all other considerations.

And what's so un-understantable about boost::path which has both narrow and
wide methods?

- Volodya


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net