Boost logo

Boost Users :

Subject: Re: [Boost-users] intrusive data structures with custom pointer class
From: Matei David (matei_at_[hidden])
Date: 2014-02-10 18:49:24


On Fri, 07 Feb 2014 16:00:31 -0500
boost-users-request_at_[hidden] wrote:

> Message: 6
> Date: Fri, 07 Feb 2014 22:00:23 +0100
> From: Ion Gazta?aga <igaztanaga_at_[hidden]>
> To: boost-users_at_[hidden]
> Subject: Re: [Boost-users] intrusive data structures with custom
> pointer class
> Message-ID: <52F54967.9020704_at_[hidden]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> El 06/02/2014 23:52, Matei David escribi?:
> > Hi,
> >
> >
> > I would like to use some boost::intrusive data structures (lists and
> > trees) in a project I'm working on, but I ran into trouble trying to
> > customize the existing classes to my needs. I decided to post my
> > problem here, in case somebody knowledgeable of boost::intrusive
> > *implementation details* can assist me. Thank you in advance.
>
> Hi Meti,
>
> You have a lot of questions, I hope I can answer most of them.
>
> > Background:
> >
> > In the project I'm working on, I need to manage a large number of
> > objects in memory. The data is dynamic, so I need to be able to
> > alloc&dealloc objects, in addition to keeping the objects in various
> > lists&trees (for which I'm trying to use boost::intrusive). The RAM
> > usage is large (think, >100GB), but the number of objects is smaller
> > than 2^32. So one thing I'd like to do is to replace 8-byte raw
> > pointers with 4-byte offsets into a deque. Doing this would solve
> > the alloc/dealloc part. However, I now need to customize
> > boost::intrusive containers to work with these 4-byte offsets
> > instead of raw pointers.
>
> I guess the deque is a global object, right. What do you mean with
> "offset" into a deque? It's the offset passed to "deque::operator[]"?

Hi and thanks for the reply.

Contrary to some of the things you said, I found a way to (partly?)
implement my specific use case with boost::intrusive. However, I have
some concerns:

- I only tested a small part of the boost::intrusive functionality
  with my customizations, so my success might be superficial. Could you
  have a look and make an informed guess as to how stable this is?

- I am concerned that further boost updates might inadvertently break
  my customizations, and I would hate to list =1.55 as a specific
  dependency (as opposed to >=1.55). I understand this can never be
  fully avoided, but I'm hoping you could decide to support my use
  case, either officially by including it in the code base and the docs,
  or at least non-officially by being aware of it.

I found and fixed this along the way:
https://svn.boost.org/trac/boost/ticket/9650

Let me restate my specific use-case: I have several types A, B,..., and
I want each object a1 of type A to be kept in a tree (say) along with
other A objects (not all A objects go in the same tree). Since there
are so many A objects, I care about their memory usage. Specifically, I
want to replace 8-byte pointers (A*) by 4-byte handles (which I call
Ptr<A>). An A-handle is really an uint32_t offset into the array of
type A objects. A B-handle is also an offset, but into another array,
holding B objects. Thus:

- The transformation handle-to-object_address is O(1).
- The transformation object_address-to-handle is prohibitive.
- I can only use A-handles for A objects stored in a unique A-object
  array. If I construct an A object elsewhere, I can refer to it with a
  raw pointer, but not with an A-handle.
- Objects of type B are stored in a different array than objects of
  type A. This means, in particular, that handles are not freely
  re-interpretable as real/smart pointers are with casts. So,
  dereferencing the A handle 42 gives me the A object number 42 from
  the A object array; but dereferencing the same handle reinterpreted
  as a B handle gives me the object number 42 from the B object array.
  NOTE: These pointer casts are my primary source of concern. I will
  come back to this below.

My goal is to make boost::implicit structures work with these handles
instead of raw/smart pointers. Intuition tells me there isn't a serious
reason why they shouldn't, and that this type of scenario is general
enough to be of interest to others, and therefore to boost maintainers.

My (partial) success is here:
https://github.com/mateidavid/sga/raw/matei-dev/src/Dev/factory.hpp
https://github.com/mateidavid/sga/raw/matei-dev/src/Dev/test-i_list-orig.cpp
https://github.com/mateidavid/sga/raw/matei-dev/src/Dev/test-i_tree-orig.cpp
The code is a work in progress. You'll need a few more included files.
I'm compiling with, e.g.:

$ [g++|clang++] [-DNONSTATIC_VALUE_TRAITS]
-I <path_to_1_55> -g -O0 -std=c++0x -pedantic -Wall -Wextra -o
test-i_list-orig test-i_list-orig.cpp common.cpp globals.cpp

For lists with nonstatic traits, you need the bug fix mentioned earlier.

> > - Given a type T, I rolled out pointer-replacement class Ptr<T>
> > which holds a 4-byte int, and upon operator*, it produces a Ref<T>
> > object.
> > - A reference-replacement Ref<T> class similarly holds a 4-byte int,
> > it implements operator& which produces the associated Ptr<T>
> > object, and it also provides an operator T& () conversion which
> > dereferences the object from the deque.
> > - With these, I tried a custom traits struct looking like this:
> > value_type = node := T
> > node_ptr = pointer := Ptr<T>
> > reference := Ref<T>
>
> Boost.Intrusive models (or tries to model) a pointer type that can be
> modeled with std::pointer_traits (more precisely
> boost::intrusive::pointer_traits). In that model the reference type
> must be a type that implements the dot operator. Sadly in C++ there
> is no way to implement a class that overloads this operator. That's
> why your pseudo-reference type will not work.

I thought about this. There are 2 different perspectives.

From the point of view of the boost::intrusive data structures (e.g.,
list), I think these structures should never ever dereference any
object they store. So, e.g. inside list.hpp, if we have somewhere an
object val of type "reference" (such as in insert()), the code should
never use "o.<field> = 42". Instead, whatever fields the data structure
needs to access (like r/w access to parent pointers), they should
access them through the traits class methods (e.g.
traits::set_field(o,42)). However, this already seems to be the case
today.

From the POV of the traits classes, indeed, these need to dereference
values. First of all, there is a problem with the function signatures.
In
http://www.boost.org/doc/libs/1_55_0/doc/html/intrusive/value_traits.html
the signature of to_node_ptr should be changed from:
static node_ptr to_node_ptr (value_type &value)
to:
static node_ptr to_node_ptr (pointer_traits<pointer>::reference value)
This just involves changing the docs; you can change the signature
(as I did in my customization) and it compiles just fine.

Also, even with the right signature, the function should never do:
{ return value.get_node_ptr(); }
but instead:
{ return (&value)->get_node_ptr(); }
because both reference::operator& and pointer::operator-> are
overloadable.

However, the concern about traits classes is not as serious as the one
about the data structures because users are given the option to write
their own traits classes anyhow.

So overall, I think the problem with the dot operator isn't serious
enough to make this use case unattainable.

> > A. The traits classes seem to be there to support 2 existing
> > customization types, namely the situation where "node" is a base
> > class of "value" (implementing "base hooks") and the situation
> > where "node" is a member of "value" (implementing "member hooks").
> > My situation represents a 3rd customization type.
>
> You can use function_hook, the hook does not need to be a member or
> base class.

The hack I used seems to work for now. In the future, I think it would
be nice to have the option to explicitly specify the header node (so
the container would just use a pointer) in the interest of clearly
separating any type of node/value allocation from the containers. But I
wouldn't say that's a priority.

> > B. Even though the traits classes allow one to specify pointer
> > types, it's possible (?) that under the hood the classes were
> > designed to support only raw pointers and smart pointers, not the
> > indirect offset-like pointers I need.
>
> See previous answer about supported reference types.
>
> > Given A&B, it seems to me that in my customization I would need 2
> > different hacks. In the interest of code stability, I started
> > wondering whether it is feasible to adapt existing boost::intrusive
> > data to my needs (my first instinct) as opposed to simply rolling
> > out my own rbtree.
>
> I don't know the impact of supporting smart reference types. It might
> be impossible, I haven't tried it.
>
> > 0. Obviously: Have you seen a similar situation being dealt with
> > before? Maybe there's an existing solution out there. My reluctance
> > to rolling out my own intrusive rbtrees is due to the fact I'd have
> > to test them, as opposed to using a known implementation.
> >
> > 1. Are boost::intrusive data structures supposed to support custom
> > pointer classes (neither real pointers nor smart pointers)? For
> > instance, in 1.46 I found this code in
> > boost/intrusive/detail/tree_algorithms.hpp
>
> Yes, boost::interprocess::offset_ptr (but a raw reference can be
> converted to a smart pointer easily).
>
> > static node_ptr uncast(const_node_ptr ptr)
> > {
> > return node_ptr(const_cast<node*>(
> > ::boost::intrusive::detail::boost_intrusive_get_pointer(ptr)));
> > }
> > However, intrusive lists worked just fine in 1.46, which suggests
> > this was simply a bug.
>
> This was fixed in recent Boost releases using pointer_traits to
> implement this function. You can customize pointer_traits to
> implement your own version.
>
> > 2. Can you point me to an example of defining and using custom
> > pointer class with boost::intrusive? To enable compilation (of
> > lists only) in 1.46, I had to customize std::iterator_traits (that
> > part I understood), but then also boost::pointer_to_other and
> > boost::intrusive::detail::smart_ptr_type. Even though the code
> > compiled, I don't know what those are doing exactly, so I'm weary of
> > them.
>
> See the example in libs/intrusive/example/doc_offset_ptr.cpp and
> boost/interprocess/offset_ptr.hpp. See the most recent boost version
> for fixed bugs. A dumb smart pointer is used in boost intrusive
> tests, and it's found in libs/intrusive/test/smart_ptr.hpp, but it's
> a bit oudated. Anyway I think pointer requirements for
> Boost.Intrusive might no be correctly specified and it might require
> some fixes. Recent versions uses pointer_traits, and I recommend you
> this approach.
>
> > 3. In 1.55, I have a few more pages of errors, I think mainly due to
> > these pointer customization issues I don't fully understand.
> > Something seems to have changed from 1.46 to 1.55. Is there a
> > reference about what changed? Or is there a portable way of rolling
> > out such a custom pointer class?
>
> pointer_traits was specified and all code was ported to use this
> customization point. The reference type is assumed to be a raw
> reference (as there is no way to overload operator dot to implement a
> working smart reference).
>
> The only why to use Boost.Intrusive would be to implement a smart
> pointer (of any pointee type) that can be fastly obtained from raw
> reference. If this is your case, Boost.Intrusive might already have
> all that you need. If you need a "smart reference", then
> Boost.Intrusive is not prepared for this and I don't know how many
> effort would require adding support for this (even if this approach
> is feasible).
>
> The problem with "node" objects embedded in the container could be
> solved with an option that allows external allocation of these nodes.
>
> Best,
>
> Ion

Empirically, since I got the containers (at least partially) working
with non-standard references, they must be already prepared to handle
those. Theoretically, as I described above, the containers themselves
should never have to dereference specific fields anyhow, and therefore
should be immune to reference type changes.

The more serious concern, ***and the reason I wrote all this***, is with
pointer casts. Unlike raw or smart pointers, these handles are
type-specific. So casting them to something else and then dereferencing
that object should be done with care. Take bstree, for instance. Its
base class (bstbase3) inherits from node & Value_Traits. The way you
obtain access to the Value_Traits struct (used with stateful traits) is
through a pointer which you later want to dereference:

typedef typename pointer_traits<node_ptr>::template
rebind_pointer<const real_value_traits>::type
const_real_value_traits_ptr;

const_real_value_traits_ptr real_value_traits_ptr() const
{ return
  pointer_traits<const_real_value_traits_ptr>::pointer_to(
    this->get_real_value_traits());
}

With handles, the only way I see for this to work is to eventually
transform handles into real pointers. So, Ptr<T>::rebind<U>::type
should be U* except for U in {T, const T} and possibly {void, const
void}. I initially tried to wrap voids as well, but I get errors. (Try
adding -DWRAP_VOID when compiling test-i_list-orig.cpp)

The problem are these casts:

- In iiterator_members:
  const_void_pointer :=
  pointer_traits<NodePtr>::rebind_pointer<const void>
- In iiterator:
  void_pointer :=
  pointer_traits<node_ptr>::rebind_pointer<void>
- In list_iterator:
  const_real_value_traits_ptr :=
  pointer_traits<void_pointer>::rebind_pointer<const real_value_traits>
- iiterator_members ctor takes as 2nd argument a const_void_pointer
- list_iterator ctor calls iiterator_members ctor, passing as 2nd arg a
  const_real_value_traits_ptr

When I wrap voids:
const_void_pointer := Ptr<const void>
void_pointer := Ptr<const void>
const_real_value_traits_ptr := value_traits*

which means the container code is trying to cast:
Ptr<T> -> value_traits* -> Ptr<const void> -> value_traits* -> deref,
and the 2nd cast fails.

However, things seemed to work ok if I don't wrap voids. So the main
concern with my use case is that: ***intrusive containers should not
cast node_ptr objects into other types of pointers (not even void*),
and then attempt to cast them back***. They can cast other pointers
(such as value_traits*) to void* and back, but not node_ptr-s. Please
have this use case in mind when further developing boost::intrusive.
The files I linked above can serve as a basic test case.

I can help writing up this use case in slightly more detail if you
think that would be useful for others.

Thanks,
Matei


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net