Boost logo

Boost :

From: Beman Dawes (bdawes_at_[hidden])
Date: 2005-05-18 16:26:13


At 02:03 AM 5/11/2005, Vladimir Prus wrote:
>On Monday 02 May 2005 16:59, Beman Dawes wrote:
>
>> >I recall we had a long discussion concerning basic_path vs. single
path
>> >type. I don't think results of that discussion are present in
i18n.html
>> >essentially, there's no rationale for going with basic_path.
>>
>> OK, I'll add rationale. Here is a first draft:
>>
>> During preliminary internationalization discussion on the Boost
>> developer's
>> list, a design was considered for a single path class which could hold
>> either narrow or wide character based paths. That design was rejected
>> because:
>>
>> * There were technical issues with conversions when a narrow path was
>> appended to a wide path, and visa versa. The concern was that double
>> conversions could cause incorrect results, that conversions best left
to
>> the operating system would be performed, and that the technical
>> complexity
>> was too great in relation to perceived benefits. User-defined types
would
>> only make the problem worse.
>
>I think this statement is not proved. Essentially, you are saying that
>there's
>an operating system that performs some char->wchar and wchar->char
>convertions in path operations, but does not provide any API to do the
same
>convertion on plain char* and whar_t* pointers. I find this somewhat hard
>to believe.

Windows, for one. Although that is really beside the point. The worry is
the need for conversions when a path changes from wide to narrow, or visa
versa.

>> * The design was, for many applications, an over-generalization with
>> runtime memory and speed costs which would have to be paid for even
when
>> not needed.
>
>
>I disagree. Consider that your current design does not allow to mix
>different
>path types at all. So, we should evaluate the performance of single path
>design only for the case where char/wchar_t are never fixed -- that is
all
                                                        ^^^^^ mixed?
>paths are created either from char, or from wchar_t.
>
>Then, the memory overhead is a single bool flag, telling if a path was
>created from char or whar_t.

The memory overhead I was worried about wasn't user space for the bool, but
the need to link in both narrow and wide versions of functions,
particularly on low memory embedded systems.

> No operating will need to do any conversion, so runtime
>overhead is just checking of that flag. I find this overhead very small,
>compared to the size of memory allocated for path, and the amount of work

>done by path method. Not to mention that a single OS call is likely to be

>1000 times more expensive that this single comparison.

That comparison isn't a worry for me either.

>
>> * There was concern that the design would be confusing to users, given
>> that
>> the standard library already uses single-value-type strings, rather
than
>> strings which morph value types as needed.
>
>I don't think we should stick to std::string design, given that most
>environments with good Unicode support (Qt, Java, .Net) use a single
string
>type.

A lot of people say they don't like the std::string design, but it is the
standard for C++. Perhaps someday another string design will become
popular, but that isn't even on the horizon AFAIKS.

>> >Also I note that there's no conversion from basic_path<char> to
>> >basic_path<wchar_t> or vice versa, as far as I can say. To recall my
>> >argument
>> >for conversion: say I have a library which exposes paths in the
>> interface,
>> >should I use path or wpath in it? If I use path, then due to missing
>> >conversion, the library is unusable with other code that uses wpath.
So
>> >I need to use wpath.

Yes. It is the same situation as with std::string vs std::wstring. If you
think your app may sometimes have to deal correctly with wide strings (or
paths) you should use std::wstring (and wpath).

>> > And so basically, all libraries need to use wpath
>> >everywhere. So, why do you need path at all?
>>
>> Applications which need wide-character internationalization will use
>> wpath
>> or other wide-character basic_path types. Applications which don't need

>> wide-character internationalization will use path. Both are needed -
they
>> serve different user needs.
>
>I think you're missing my point. Yes, the decision for application can
>probably me made. But if I'm writing a library I don't know if it will be

>used by application that needs wide paths, or application that does
not >need wide paths.
>
>I have to decide which path type to use in the interface (I'm talking
about
>binary interface specifically). But if there's no path<->wpath
convertion,
>then whatever type I choose, some applications will have troubles using
the
>library, because they would not be able to convert between path types
on >the library boundary.
>
>Even if I provide both types in the interface, if there's no standard
>path<->wpath conversion, I'll have to either:
>
>- write such convertion myself
>- duplicate all code of the library -- for path and for wpath

Partially in answer this very valid concern, I've exposed the wpath_traits
conversion interface. I'm not sure that is a complete solution, but at
least you wouldn't have to write the conversion code yourself.

Please note that I'm not saying a single-path-type design is dumb or
anything like that. It is just that it would be too big a leap without a
lot of experimentation, trial use, etc. It would be a lot better to start
with a single-string-type design. That's all just too big a project for me,
and too much of a research project. I'm very happy with the new version of
Boost.Filesystem. I think it smooths many of the rough spots of the current
1.33 version. It attacks most of the problems users have had head on. If
someone else wants to do a new library that is even better, great! But
that's a new library, not the current one.

Thanks for the comments,

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk