Boost logo

Boost :

Subject: [boost] [outcome] user's experience
From: Andrzej Krzemienski (akrzemi1_at_[hidden])
Date: 2017-05-26 23:02:32


Hi Everyone,
In order to test Boost.Outcome, I applied it to the problem I had to solve
at work a while ago. For people who do not know what Boost.Outcome is for,
this could be an introduction. For me it can be an opportunity to obtain
feedback from Niall and Vicente, If I am using `expected` correctly, and
they can get some feedback from me.

Here is the task. I get an std::string as input: it is a data structure
encoded in ASCII text:

"1-10|11-20|21-99;1-10|10-50|51-99"

Semicolons separate "distributions". Within distribution, vertical bars
separate "ranges". Within ranges, dashes separarate integer numbers. There
are some additional constrains. like ranges within distribution cannot
overlap, no gaps between ranges are allowed, and a couple more.

I need to validate if the input matches the required format, and build a
corresponding data structure. But the vaidation cannot be just binary
valid-invalied answer: I want to give user a description of what exactly is
wrong with her input, as detailed as possible.

So, I have function parse, that internally calls parse_distribution, which
in turn calls parse_range. Each of them can return different parsing error
that needs to be propagated up.

My original implementation used "out" function parameters for signalling
errors. Code was unnecessarily long and it was easy to forget to check
results from functions, but I didn't want to use exceptions to indicate
validation failures, because there is nothin exceptional about it. You
expect input validation to fail -- quite often.

Now, this is a more philosophic question: when to use (or not) exceptions.
One citerion I sometimes use is, "do I want my debugger to automatically
break if this function reports faiure?"

So I didn't use exceptions for input validation, but the ugliness of the
code forced me to go back to rewrite it with exceptions. The code was
shorter, and "saer" in the sense that I could not forget ot pass the
failure up the stack.

Now with Boost.Outcome I have a third option. It is almost as concise as
exceptions, but does not use exceptions.

Here is my implementation:

https://github.com/akrzemi1/__sandbox__/blob/master/outcome_practical_example.md

Note a couple of things. I am creating a number of temporay strings under
way, they all can throw bad_alloc. I am fine with that. I am not affraid of
throwing an exception, it is only the validation failures that I want to
report with `expected`. For other failures I want exceptions.

This is input validation. It does not have to be super fast.

In the public API of this file, I will not be exposing the information
about parsing failues. `process` will consume the error information and
just output something. So, the choice of `E` in `expected<T, E>` is
internal to a file, and I can choose something specific to my use case. I
provid my own type, which encodes different erroneous situations in an
enum, and two ints for additional details.

```
struct BadInput
{
  enum {
    no_distrib = 1,
    range_count_not_3,
    distrib_starts_with_no_1,
    distrib_ends_with_no_99,
    gap_between_ranges,
    overlap_between_ranges,
    inverted_range,
    no_2_numbers_in_range,
    not_a_number,
  } type;
  int val1;
  int val2;
};
```
It is quite similar to Outcome's error_code_extended.

Now, I also wanted to have an std::string member inside, but `expected`
forbids it: It says as per LWG requirements `E` in `expected<T, E>` must
not kave a throwing copy constructor. Is that right? In my use case it
wouldn't harm anyone: if copying of `expected` throws, entire function
`process` is exited, with a bad_alloc, which is fine. No `expected` with
partial state will be ever observed. Do we need this restriction?

Also, at some point I will be returning an `expected<vector<T>, E>`, which
makes me uneasy. Returning vectors wrapped in something: will I get runtime
penalty for it? So, I naturally arrived at a pattern that Niall described
in the other thread:

```
out::expected<vector<T>, E> parse (...)
{
  out::expected<vector<T>, E> ans {out::value};
  vector<T> & vec = ans.value(); // will if statement be ellided?}

  // fill vec

  return ans;
}
```

(BTW, this constant out::value is not documented in the docs, unlike
out::value_t.)

It has been mentioned a number of times that an optimizing compiler can
remove a redundant if-statement. I know that the if-statement inside
`value()` is redundant, but in this particular case, will compiler know it
to? Nobody has called `has_value()` before.

Also, the following piece of code illustrates how concise the
implementation of a single function can be:

```
// I use a shortet alias for macro BOOST_OUTCOME_TRYX.
// It is not portable, though. It only works on Clang and GCC.
#define TRY(ex) BOOST_OUTCOME_TRYX(ex)

out::expected<Range, BadInput> parse_range (const std::string& input)
{
  tokenizer numbers_s {input, sep_3};
  auto it = numbers_s.begin(), itEnd = numbers_s.end();
  int count = std::distance(it, itEnd);

  if (count != 2)
    return out::make_unexpected(BadInput{BadInput::no_2_numbers_in_range,
count});

  Range r {TRY(parse_int(*it++)), TRY(parse_int(*it++))};

  if (r.lo > r.hi)
    return out::make_unexpected(BadInput{BadInput::inverted_range, r.lo,
r.hi});

  return r;
}
```

It is almost as short as if I used exceptions, except for the TRY macro.
Note in particular this line:

```
Range r {TRY(parse_int(*it++)), TRY(parse_int(*it++))};
```

`parse_int()` returns `out::expected<int, BadInput>`.Now,
`TRY(parse_int(...))` is an expression and means, "if parse_int() failed
immedaitely return function `parse_range` with the same BadInput,
otherwise use its `value()` to initialize `r`.

What do you think. Am I using the library correctly?

Regards,
&rzej;


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk