Boost logo

Boost :

From: Pierre-Andre Galmes (galmes_at_[hidden])
Date: 2004-12-04 12:14:27


Beman Dawes wrote:
> At 07:40 PM 10/3/2004, David Abrahams wrote:
> >Beman Dawes <bdawes_at_[hidden]> writes:
> >
> >> At 10:20 AM 10/3/2004, David Abrahams wrote:
> >>
> >> >I've said it before, but I always found the checking to be much more
> >> >of a hindrance than a help.
> >>
> >> So presumably you would be in favor of changing the default to
> >> "no_check"?
> >
> >I think so.
>
> Unless strong objections arise, I'll make the change to the main trunk
> after the 1.32 branch for release. That will give us plenty of time to
> work out any kinks before release to the general public via 1.33.
>
> --Beman

Greetings,

I am new in Boost Filesytem and my involvement is related to one of my
courses. As a project, we have to a request for change and we can then
submit our "contribution" by posting it on the list. The first part of
my work was to analyse a request. The second part is this post. The
analysis can be found on the following link
(http://perso.efrei.fr/~galmes/boost/name_check.html). It explains which
is the problem in detail using examples.

The request is the following : The automatic name checking
functionality, in the Boost library is turned on by defaults. That means
the Boost::filesystem will check for "portable" paths. The question is
then, should the "name check" functionality be turned on by default ?
Which between portable_name, native and no_check would be the best
choice ? This mail is divided in three parts. The first one expose the
conclusions drawn from the analysis. The second part is the
argumentation of the results.

The last part is composed of some suggestions after having spend some
hours going through the library. I apologize for the length of this post
and hope that it will help to improve the boost::filesystem library.

I - The results

Here is what I found out from that analysis (which should not be big
news :-):

- by default the "no_check" option seems to be the best.
- "native" would be the best once it supports multiples filesystem
checks. It may then be a good idea if this kind of check is going to
work in Boost 1.33 !

Suggestions to improve the "usability" of the library :

- the name "native" is not so explicit and tends to confuse users.
- How the "native" option works could by explained in an
   "easy-to-understand" way in the documentation.

I - Argumentation

The automatic name checking functionality can behave in many different
ways, and from those, only three could be used as default values :

     * portable_name : check if the path is "portable" (default on
       boost-1.3.2).

     * native : check if the name is valid for the OS being used to
       execute the program.

     * no_check : does not perform any check.

We will then try to show which of those is the best. For this, I try to
make an "objective" analysis trying to quantify the choices. This is a
way to try to solve the problem but might not be the best one : it is a
lot of explanations and may just get rid of most of the readers ;).

Here is a summary of the pros and cons for each choice from the previous
posts.

a - portable_name (default on boost 1.32)

     * Pro 1 : Is the check that require the strictest names to ensure
     portability. Programs using this should not have any path
     portability problems for most common operating systems.

     * Cons 1 : enabling name checking by default prevents users doing
     programs with no name constraints. Those non-portable program
     might constitute the majority of programs written using
     Boost::filesystem. (http://tinyurl.com/5mb5r)

     * Cons 2 : checking implies a performance hit (see
     http://tinyurl.com/6esmw)

a - no_check

     * Pro 1 : Does not put any constraints on the users with no need of
     portability which should represent the majority of users. (See
     http://tinyurl.com/5mb5r)

     * Pro 2 : The use of the option "no_check" is explicit (compared to
     "native" or "portable_name").

b - native :

     * Pro 1 : Less restrictive that "portable_name" but still realise
     some checks for the native operating system.

     * Cons 1 : the name "native" is confusing. Not explicit enough that
     native check it will check on the operating system, not on the file
     system (http://tinyurl.com/6esmw).

     * Cons 2 : Give an illusion of "security"/ "portability" on the same
     operating system. This is false, as the check is done on the
     operating system, not on the file system. Thus, on the same
     operating system the portability depends on the file system used. An
     example of this problem was given in a post by Beman (See
     http://tinyurl.com/635yn).

d - Different matters and their importance

Now that, here is a table summarizing which of those arguments are
important for the users wanting to use the library for doing portable
programs and those important for the "common users" with no need for
portability. This is my point of view, and I would be interested in
knowing which is yours. The explanation about the different terms used
are given below.

! ! portable users ! common users !
!---------------------------------------------------------------!
! explicit ! + ! ++ !
! portability check ! +++ ! - !
! OS check ! - ! + !
! fs check ! - ! ++ !
! performance hit ! ++ ! ++ !
! security illusion ! ++ ! + !
! ease of use ! +++ ! +++ !

- explicit name :

The name of the option (native, no_check...) is self explanatory about
the way the check is done.

common users (++) : this is important so that they are able to use the
library without having to read in detail the documentation.

portable users (+) : They will have to read the documentation more in
detail in order to know how to produce portable path. Hence, that the
name is explicit is a less important criteria for choosing a default value.

- portability check :

Checks that path will be valid over the most popular platforms (POSIX,
Windows). This is the check done by "portable_name".

common users (-) : for common users that any check related to
portability is of no importance as they won't port their programs.

portable users (+++) : this is one of the most important criteria.

- OS check :

Checks that for the current OS, paths are valid.

common users (+) : if paths are not accepted, the program will not work.
I only put (+) because often users know which characters are valid for
their OS.

portable users (-) : When writing portable programs, you do not really
care that it is valid for your OS : it should be valid for all OS. This
is covered by the portability check.

- fs check :

Checks the validity of the path for the file system the path tries to
access. I suppose that this check should be done at runtime when trying
to work with a particular file or directory.

common users (++) : if paths are not accepted, the program will not
work. I put (++) because less users know which characters are valid for
the different file systems they manipulate.

portable users (-) : When writing portable programs, you do not really
care that it is valid for your fs : it should be valid for all fs. This
is covered by the portability check.

- ease of use :

Is the library easy to use according to the users aims ? Does the user
has to write many lines of code in order to achieve what he wants ?

all users (+++) : This is the most important feature. If a library does
not provide nice interfaces for the users, that he has to reconfigure it
all the time so that the checks succeed

- performance hit :

Does using a check implies a performance hit ?

all users (++) : This is also an important feature. When coding a
program, users always want it to run fast. This is an important criteria
but less important that the ease of use, especially if the performance
hit is mild.

- security illusion :

Does the program gives the illusion that it will work correctly on
different platforms ?

common user (+) : This is not a big deal, as portability is not the
first problem when writing a program.

portable user (++) : If the program written in order to be portable just
give an illusion of such a behavior, this would be a real problem.

e - Different criteria and their availability

Here is a table representing for each of the three options there
behavior for the different criteria listed above. The characters '+',
'=' and '-' represent the "points" given to the criteria if it is
available for an option. The criteria ease-of use is separated for the
two kind of users.

table of "appearance" :

! ! portable_name ! native ! no_check
!-------------------------------------------------------------------!
! explicit name ! + ! - ! ++ !
! portability check ! ++ ! + ! - !
! OS check ! ++ ! ++ ! - !
! fs check ! - ! - ! - !
! no performance hit ! - ! - ! ++ !
! no security ill. ! - ! - ! ++ !

common users :
! ease of use ! - ! ++ ! ++ !

Portable users :
! ease of use ! ++ ! - ! - !

f - The best option

We can now try to satisfy the most users by comparing the criteria in a
"mathematical" way. As a first though, we could suppose that the
majority of the users would use boost::filesystem for non-portable
programs. Let say 80% will write programs without any need for portability.

 From this we can then deduce which criteria will fit the best. For each
option, and for each kind of users, we calculate it in the following way :

option
  = explicit importance * appearance
  + portability_check importance * appearance
  + OS check importance * appearance
  + fs check importance * appearance
  + ease of use importance * appearance
  + no performance hit importance * appearance
  + no security ill. importance * appearance

We use the table and choose the following :

+ = 1 point
- = 0 points

Then, We find :

common users :

! ! portable_name ! native ! no_check !
!------------------------------------------------------------------!
! explicit ! 2 * 1 ! 2 * 0 ! 2 * 2 !
! portability check ! + 0 * 2 ! + 0 * 1 ! + 0 * 0 !
! OS check ! + 1 * 2 ! + 1 * 2 ! + 1 * 0 !
! fs check ! + 2 * 0 ! + 2 * 0 ! + 2 * 0 !
! performance hit ! + 2 * 0 ! + 2 * 0 ! + 2 * 2 !
! security illusion ! + 1 * 0 ! + 1 * 0 ! + 1 * 2 !
! ease of use ! + 3 * 0 ! + 3 * 2 ! + 3 * 2 !
!------------------------------------------------------------------!
! sum ! 4 ! 8 ! 16

portable users :

! ! portable_name ! native ! no_check !
!------------------------------------------------------------------!
! explicit ! 1 * 1 ! 1 * 0 ! 1 * 2 !
! portability check ! + 3 * 2 ! + 3 * 1 ! + 3 * 0 !
! OS check ! + 0 * 2 ! + 0 * 2 ! + 0 * 0 !
! fs check ! + 0 * 0 ! + 0 * 0 ! + 0 * 0 !
! performance hit ! + 2 * 0 ! + 2 * 0 ! + 2 * 2 !
! security illusion ! + 1 * 0 ! + 1 * 0 ! + 1 * 2 !
! ease of use ! + 3 * 2 ! + 3 * 0 ! + 3 * 0 !
!------------------------------------------------------------------!
! sum ! 13 ! 3 ! 8

We can now use the fact that 80% of the users should be common users. We
can then try to calculate which option is to be used most widely :

portable_name = 0.8 * 4 + 0.2 * 13 =
                = 3.2 + 2.6
                = 5.8

native = 0.8 * 8 + 0.2 * 3
        = 6.4 + 0.6
        = 7

no_check = 0.8 * 16 + 0.2 * 8
                = 12.8 + 1.6
                = 14.4

We can then deduce that the no_check option suits the best to most of
the common needs !

g - The native problem

In the 1.32 version of boost, the option native has the problem : the
checks are done so that the path used will be accepted by the operating
system used to compile not by the file system the application will access.

This is confusing, as by native, I would expect checks to work on the
platform for which you compiled the program. I personally do not think
about file systems. In my opinion, this is what confuses the user and
gives a security illusion.

I read that Beman was working to change the behavior of the native
option, so that it would also check on the file system. If we check what
are the results doing so, we found that then the native option would fit
the best.

! ! portable_name ! native ! no_check !
!---------------------------------------------------------------------!
! explicit name ! + ! ++ ! ++ !
! portability check ! ++ ! + ! - !
! OS check ! ++ ! ++ ! - !
! fs check ! - ! ++ ! - !
! no performance hit ! - ! - ! ++ !
! no security ill. ! - ! + ! ++ !

common users :
! ease of use ! - ! ++ ! ++ !

Portable users :
! ease of use ! ++ ! - ! - !

common users :

! ! native !
!---------------------------------------!
! explicit ! 2 * 2 !
! portability check ! + 0 * 1 !
! OS check ! + 1 * 2 !
! fs check ! + 2 * 2 !
! performance hit ! + 2 * 0 !
! security illusion ! + 1 * 1 !
! ease of use ! + 3 * 2 !
!---------------------------------------!
! sum ! 17 !

portable users :

! ! native !
!--------------------------------------!
! explicit ! 1 * 2 !
! portability check ! + 3 * 1 !
! OS check ! + 0 * 2 !
! fs check ! + 0 * 2 !
! performance hit ! + 2 * 0 !
! security illusion ! + 1 * 1 !
! ease of use ! + 3 * 2 !
!--------------------------------------!
! sum ! 15

native = 0.8 * 17 + 0.2 * 15
        = 13.6 + 3
        = 16.6

If the problem described, with multiple file systems mounted is solved,
that would be the best choice, but for now, it would bring many mistakes
due to misunderstanding how native works in my opinion.

g - limitations

This approach is not really objective, as I was the one choosing the
weights of the different criteria, and those are just arbitrary choices.
This is especially true for the explicit criteria or security illusion.

On the other side, it is an approach that has the advantage of
"measuring" and giving a solution.

II - Some suggestions about native

The native option is quite confusing. I had to read through the mailing
list before being able to understand how it works. The documentation
does not explains in a really explicit way how it works. And I don't
thing I am the only one, as Walter was also (See http://tinyurl.com/6esmw).

Ideas to solve that :

1 -> Give a more explicit name (OS_native ?).

2 -> Change the documentation so that this point is explained a bit more
? Why not add an example ? The example of Beman Dawes what quite
explicit about how that works !

I hope I didn't bored too many of you ! If you read this sentence, that
is near to be a miracle ! :-)

Thank-you for having taken the time to read through this post !

Cheers,
Pierre-Andre Galmes


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk