Boost logo

Boost :

Subject: [boost] [optional] Specializing optional to save space
From: David Stone (david_at_[hidden])
Date: 2015-09-27 16:03:18


Andrzej posted another thread about creating a new compact_optional class.
The goal of this class is to make it easy to create a new optional type
with a special sentinel value. This allows easy customization per instance
of an optional value.

This post is about the opposite approach: making it easier to specialize
optional to provide special behavior for all instances of a given optional
type.

When I bring this up, people often compare it to the vector<bool>
specialization, but this is a completely different issue. The problem of
vector<bool> is that it does not actually store a bool anywhere, so it
cannot return a reference to one. This would still store a T somewhere.

For the rest of the discussion, I will be assuming a typical standard
library implementation on a 64-bit system. Whenever I talk about specific
sizes of objects, mentally insert "on most common systems". The principles
apply generally.

Consider the case of optional<std::string>. Using the naive implementation,
this object is 8 bytes larger than std::string, due to alignment
requirements. However, there are certain representations of std::string
that are not actually possible. For instance, for common implementations of
std::string you cannot have std::string::size() ==
std::numeric_limits<std::string::size_type>::max() *. We can take advantage
of this if our std::string implementation stores the capacity as an integer
value, rather than a pointer to the end of the storage.

Therefore, we could have implementations that look something like this:

class string {
    friend optional<string>;
};

template<>
class optional<std::string> {
    optional(none_t) {
        // Use friendship to just set capacity to sentinel value
    }
    explicit operator bool() const {
        return m_data.capacity() ==
std::numeric_limits<std::string::size_type>::max();
    }
};

optional<std::string> still stores std::string in it, so operator*() works
just fine. The initialized check is still just comparing a field against a
compile-time constant. This makes this purely a space optimization that
cannot be detected by the user, unless they use
sizeof(optional<std::string>).

I believe all of this is possible today with the current specification of
boost::optional (and would definitely be allowed for the proposed
std::experimental::optional). However, this requires the creator of each
specialization to implement the full optional interface. What's more, it is
especially tricky to specialize optional for a class template when you only
want to specialize on certain specializations of the class template.

My use case for this is my bounded::integer library. This library lets you
specify the bounds of your integer as template parameters. If your
bounded::integer has a narrower range than its underlying integer, you can
take advantage of this to make a more space-efficient optional. If,
however, the bounded::integer min and max are equal to that of the
underlying type, there is no extra space and we want the default optional
implementation. With the current specialization approach, it is the
responsibility of the specializer to reimplement all of optional.

A better approach to easily support this space optimization is as follows:

optional contains an instance of optional_storage<value_type>.
optional_storage has a much narrower interface:

* optional_storage() leaves it uninitialized
* optional_storage(in_place_t, T && ...) constructs the value directly with
the variadic arguments
* emplace(T && ...) constructs an object when there may be one in existence
already
* reset() destroys the contained object and sets the storage to
uninitialized
* is_initialized() says whether the contained object is initialized. This
could be replaced with explicit operator bool(); I have no preference.
* value() returns a reference to the value

We need both the constructor and the emplace function to allow literal
optional types. This is because for the general case, emplace calls
placement new, which is not allowed in a constexpr function.

This solves the problem of specializers having to implement the entire
interface of optional just to change how the uninitialized state is stored.
However, it still does not solve the problem I mentioned earlier about
allowing easy use of the default storage policy for some specializations of
your own class template. To solve that, we need one more class:
default_optional_storage.

default_optional_storage<T> is a literal type if T is a literal type (in
practice, std::is_trivially_destructible<T>::value is all you really need
to test). It implements the same interface as optional_storage<T>. The
default implementation of optional_storage<T> is

template<typename T>
struct optional_storage : default_optional_storage<T> {
private:
    using base = default_optional_storage<T>;
public:
    using base::base;
};

This means that the optional library would have at least three classes in
its public interface:

* optional
* optional_storage
* default_optional_storage

To see a working example of this (including a specialization for
bounded::integer), see

https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/optional.hpp

https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/specialization.hpp

* The reason is that std::string::size_type is a typedef for std::size_t,
which is equivalent in range to std::uint64_t. However,
std::iterator_traits<std::string::iterator>::difference_type is
std::ptrdiff_t, which is equivalent in range to std::int64_t. This means
that for the largest std::string, std::distance(begin(), end()) cannot be
more than std::numeric_limits<std::ptrdiff_t>::max(), which is less than
std::numeric_limits<std::size_t>::max().


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk