Boost logo

Boost :

Subject: Re: [boost] Boost.Process 0.5: Another update/potential candidate for an official library
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2012-11-16 09:03:45


On Fri, Nov 16, 2012 at 1:28 PM, Alex Perry <Alex.Perry_at_[hidden]>wrote:

> [...]
> I was trying to point out (obviously unclearly) possible problems with the
> single command line version of execute
>

OK, so you were talking about the single string versus argument array? Then
this is orthogonal to the encoding issue.

My input on command lines vs argv
==================

Problems in the current implementation
----------------------------

Try using the current implementation of set_args to call cmd.exe so it will
do the same as the following call (i.e. set MSVC environment and then print
the values of the inc* variables):

cmd.exe /C ""%VS80COMNTOOLS%vsvars32.bat">NUL && set inc"

Additionally, the current set_args implementation *is not* the right
inverse of CommandLineToArgvW! The following:

std::vector<std::string> vArgs;
vArgs.push_back("a.exe");
vArgs.push_back("a b\\c"); // single backslash -- escape
execute( set_args( vArgs ) );

will result in the following command line to CreateProcess (verbatim, no
escapes):

a.exe "a b\\c"

Which will appear in main argv[] as *two* backslashes!

Facts
--------

POSIX argv[] and Windows command line cannot be mapped bijectively,
unfortunately. The problem is that the splitting of the command line into
arguments on Windows is done by any program in its own way. Usually, for
C++ programs, this is done inside the CRT with the CommandLineToArgvW
function (or equivalent), which happens to be bugged (ask me for details if
interested). Other programs (like cmd.exe) parse the whole command line in
a totally different way.

No sensible set_args can be surjective. This may give the impression that
providing set_cmd_line is inevitable. However, the use of such function
will be limited mostly for Windows. Yet, we do not have to support
everything that each platform provides.

My opinion
----------

I prefer concise, minimal and uniform interfaces. This implies:
* Use only the set_args, no set_cmd_line.
    Rationale: consistent with POSIX and the standard argv[] passed to
main. Removes the need of run_exe or parsing the set_cmd_line to retrieve
the exe name from there.
* Leave the behavior in case of embedded quotation marks unspecified. Do
not escape quotation marks within the argument.
    Rationale: no problem on POSIX, there it does no parsing anyway. On
windows this will increase the image (set theoretic) of set_args. In
particular it will be possible to invoke both examples from above.
    Cons: It's user's responsibility to escape double quotation marks on
windows. But hey, she is the only one who could know how to do it properly.

[...]
>
> Given
>
> boost::filesystem::path foo_path = ....
> boost::filesystem::path somefile_path = ...
>
> which have been determined by some mechanism (hopefully in a x-platform
> manner without #ifdefs)
>

This is the assumption that fails. filesystem::path works great if you
write for POSIX only, it works great if you write for windows only using
wide strings, but it fails when you start writing portable code. Try
initializing those paths from, for example, fields in an SQLIte database,
or a text file. You can solve many issues imbuing a UTF-8 locale into
filesystem, but this won't solve all the issues (.c_str()), and you cannot
always change the global state to accomplish this.

Continuing with your example:

> std::vector<boost::filesystem::path::string_type> vArgs;
>
> vArgs.push_back( foo_path.native() );
> vArgs.push_back( "-f" );
>

Error on windows: cannot convert char[3] to std::wstring...

vArgs.push_back( somefile_path.native() );
>
> execute( args( vArgs ) );
>
> seemed sufficient - trying to force this into a utf8string (I may have
> misunderstood what you were arguing for) just seems awkward to me.
>

Assuming we imbued a UTF-8 codecvt into boost filesystem (or adopted a
policy that this is the default), then:

std::vector<std::string> vArgs;
vArgs.push_back(foo_path.string());
vArgs.push_back("-f");
vArgs.push_back(somefile_path.string());
execute( args( vArgs ) );

just works.

Personally I do not like using filesystem::path for various reasons, I just
use std::string for paths. So no calls to .string() will be in my code.

-- 
Yakov

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk