Boost logo

Boost :

Subject: Re: [boost] [process] Formal Review
From: Klemens Morgenstern (klemens.morgenstern_at_[hidden])
Date: 2016-11-05 21:10:18


Am 06.11.2016 um 00:44 schrieb Nat Goodspeed:
> I would like to surface a number of issues, large and small.
>
> You are clearly proud to support a number of different syntactic ways
> to express the same semantic operation. To me this is a minor
> negative. While in theory it sounds nice to be able to write Process
> consumer code any way I want, in a production environment I spend more
> time reading and maintaining other people's code than I do writing
> brand-new code. Since each person contributing code to our (large)
> code base will select his/her own preferred Process style, I will be
> repeatedly confused as I encounter each new usage.
First of all: thanks for taking the time to write the review.
I just saw the critizism of "std_out > p" coming, while others really
liked that. In practice you'd have "std_out > p", "std_out = p" and
"std_out(p)". I think that is manageable.
> I admire the support for synchronous pipe I/O, while remaining
> skeptical of its practical utility. Synchronous I/O with a child
> process presents many legitimate opportunities for deadlock: traps for
> the unwary. I would be content with a combination of:
I think that is an overkill for many problems, I'd rather provide all
the possibilities. Btw.: the main deadlock issue I had will be fixed,
I'm working on that right now.
But I'm generally not very fond of forcing people to use one solution
when there are several.
> * async I/O on pipes (yes, using Asio)
> * system() for truly simple calls
Not sure why this would need to be for simple calls, I really like my
coroutine solution (that's what I'm proud of here ;)).
> * something analogous to Python's subprocess.Popen.communicate(): pass
> an optional string for the child's stdin, retrieve a string (or, say,
> a stringstream) for each of its stdout and stderr.
That'd be limiting the interface, we discussed that in length. There
cannot be any solution using stringstream, but boost::asio::streambuf is
available. To make that even easier the library does provide a syntax
for that:

boost::asio::streambuf in, out, err;
child c("foo", std_out > out, std_err > err, std_in < in);

I might be adding a new feature to make it easier to implement your own
process object.

Then again: it's not meant as a high-level implementation, it's meant to
make it easy for you implement something like the python module. I.e.
you can do that without worrying about the OS-specific stuff.
> The example under I/O pipes the stdout from 'nm' to the stdin of
> 'c++filt'. But the example code seems completely unaware that c++filt
> could be delayed for any of a number of reasons. It assumes that as
> soon as nm terminates, c++filt must have produced all the output it
> ever will. I worry about the Process implementation being confused
> about such issues.
I fear you are right, I'd need to be checking for an empty line at the end.
>
> I'm dubious about the native_environment / environment dichotomy. As
> others have questioned, why isn't 'environment' a typedef for a
> map<string, string> (or unordered_map)?
For native_environment:

auto nat_env = this_process::environment();
nat_env["FOO"] = "BAR";
assert(getenv("FOO") == "BAR");

For environment: it stores the data already in the needed format for the
OS. Thereby I don't need to transform anything when launching a process.
That just seemed like the most logical approach to me, and once I have
the native_environment it just makes sense. Also it is easily convertible:

environment env = this_process::environment();

> I understand that you desire to avoid copying the native process
> environment into such a map until necessary, but to me that suggests
> something like an environment_view (analogous to string_view) that can
> perform read-only operations on either back-end implementation.
Well native_environnent modifies the environment of the curent process,
while environment allows you to build
> Operations involving splitting and joining on ':' or ';' should be
> defined purely in terms of strings and ranges of strings. They should
> not be conflated with environment-map support.
Seems convenient to me - what's wrong with that?

auto p = nat_env["PATH"].to_vector();
assert(p == {"bar", "foo"});

and that also goes the other way around:

nat_env["PATH"] = {"bar/foo", "foo/bar"};
> The documentation so consistently uses literal ';' as the PATH
> separator that I worry the code won't correctly process standard Posix
> PATH strings.
I'll look into that.
> At this moment in history, an example showing integration of Process
> with Boost.Fiber seems as important as examples involving coroutines.
To be honest, I haven't looked that deep into boost.fiber, not sure how
that would look. I'm open for any suggestions; async_pipe implements an
asio-I/O object, so you probably can already use that with the interface
boost.fiber provides to boost.asio.
> Why is the native_handle_t typedef in the boost::this_process namespace?
Because it's used as a return-type there.
> While in full context it makes sense to speak of "assigning" an
> individual process to a process group, the method name assign() has
> conventional connotations. Use add() instead.
Good point, assign's the term used in the OS documentations, but that of
course has a different connotation in C++.
>
> There's a Note that says: "If a default-constructed group is used
> before being used in a process launch, the behaviour is undefined." I
> assume you mean "destroyed before being used," but this is a concern.
> If a program has already instantiated a process group, but for any
> reason decides not to (or fails to) launch any more child processes,
> does that put the entire parent process at risk? What remedial action
> can it take? Move the group object to the heap somewhere and let it
> leak? Spawn a bogus child process for the purpose of defusing the
> ticking process group instance?
Nope, it just may return garbage; yeah undefined behaviour is a term
that smells like it would blow up you program, that's a misnomer, sry.
On posix it will just yield nonsense (has(pid)) or do nothing and give
an error (terminate()), while that works on windows. I think
platform-dependent would be the correct word and I'll describe the details.
> If you're going to reify process group at all, you should wrap more
> logic around it to give it well-defined cross-platform behavior. And
> it should definitely be legal to create and destroy one without
> associating any child process with it.
Destroying is no affected by that.
> Given support elsewhere for splitting/joining PATH strings, the
> string_type path parameter to search_path() feels oddly low-level.
> Maybe accept a range of strings?
Would make sense, though a range of fs::path would be even better.
>> At a minimum, kindly state:
>> - Whether you believe the library should be accepted into Boost
>> * Conditions for acceptance
> YES, IF the present Boost.Process preserves the ability of its
> predecessor to extend it without modifying it. I'm sorry, thorough
> reading of the documentation plus some browsing through the code
> leaves me doubtful.
It is possible, that's one of the main reasons I insist on that interface.
>
> Examples of such extensions:
>
> * With Boost.Process 0.5 I can use Boost.TypeErasure to pass an
> "initializer" which is a runtime collection of other initializers.
> Such a feature in Process 0.6 would support people's requests to be
> able to assemble an arbitrary configuration with conditional elements
> and use it repeatedly.
That is not possible, because the functions in the handlers need to be
templated - which is a major change in that regard from 0.5. This one is
necessary to allow the initializer to access the sequence; this again is
required for the simplified asio-stuff.
Regarding the construction at runtime: I've got an idea for that (though
also not runtime) which might help out there and be extensible. Or there
could be an initializer_variant, though that would also not be
type-erasure. I'll let you know if I got a prototype of either one up.
> * I can create an initializer (actually one for each of Posix and
> Windows, presenting the same API, so portable code can use either
> transparently) to implicitly forcibly terminate the child process
> being launched when the parent process eventually terminates in
> whatever way.
You can do that, by just inherting the handler_base which has on_error,
on_success and on_setup or the os-specific extension (handler_base_ext)
which on posix adds on_exec_setup, on_exec_error, on_fork_error.
>
> Please understand that I am not asking for the above features to be
> absorbed into the Process library: I am asking for a Process library
> extensible enough that I can implement such things with consumer code,
> without having to modify the library. Perhaps Process 0.6 already is!
> It's just hard for me tell.
Sure, that's important to me, especially since that would be a good way
to eventually add more features if they are required and to know that
they're already used. I fear that calls for another section in the
documentation, though that would be more implemenation notes than a
tutorial.
>
> Another feature should, in my opinion, be incorporated into the library:
>
> * On Windows, if you desire to pass any file handles at all to the new
> child process, it is completely whimsical what *other* open file
> handles you may inadvertently pass -- unless you play games with
> STARTUPINFOEXA and PROC_THREAD_ATTRIBUTE_LIST to enumerate exactly the
> set of handles you intend to pass.
> If the library doesn't natively support that, I *must* be able to
> pass a custom initializer with deep access to the relevant control
> blocks to set it up. This is a showstopper, as in "you can't remove
> that file because, unbeknownst to you, some other process has it
> open."
You can do that through a custom initializer.
>> - Your knowledge of the problem domain
> I have hand-written process-management code in C++ several times
> before -- each with multiple implementations to support multiple
> platforms. I have tested the previous candidate Boost.Process 0.5 with
> a number of programs exercising a number of features.
>
>> You are strongly encouraged to also provide additional information:
>> - What is your evaluation of the library's:
>> * Design
> Design notes are at the top of this mail.
>
>> * Implementation
> I have only glanced over parts of the implementation. It seems
> somewhat more obscure than the corresponding pieces of Process 0.5,
> which is why I couldn't quickly satisfy myself as to the library's
> extensibility.
That's mainly because of the dual-syntax, to allow `child c("foo",
"/bar")`, which means that there are two ways to obtain the actual
initializers. That makes is not as straight-forward as 0.5.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk