Re: [Boost-bugs] [Boost C++ Libraries] #10740: Multi-level containers do not cooperate with address tracking

Subject: Re: [Boost-bugs] [Boost C++ Libraries] #10740: Multi-level containers do not cooperate with address tracking
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2014-11-27 21:28:08


#10740: Multi-level containers do not cooperate with address tracking
-------------------------------------+-------------------------------------
  Reporter: Simon Etter | Owner: ramey
  <ettersi@…> | Status: closed
      Type: Bugs | Component: serialization
 Milestone: To Be Determined | Severity: Problem
   Version: Boost 1.56.0 | Keywords: Address tracking, STL
Resolution: invalid | containers
-------------------------------------+-------------------------------------

Comment (by Simon Etter <ettersi@…>):

 I don't understand your posted code snippet. The second line lets `pd`
 point to some element in `l`. On the third line from below, you read a new
 vector into `l`, which calls `clear()` on `l` (see
 collections_load_imp.hpp : 140) and therefore invalidates `pd`. On the
 last line, you nevertheless dereference `pd`. This is undefined behaviour.

 I think we are still talking past each other. Let me describe the
 situation once more for the one-level case. Assume `oa` is any output
 archive, and `l` and `pd` are defined and initialized as follows:
 {{{
 std::vector<dummy> l(1);
 dummy* pd = &l.back();
 }}}
 I call `d` the object of type `dummy` which is located at the address
 `&l.back()`. I emphasize that the type and address of `d` are the only
 relevant properties here. We first serialize `l`:
 {{{
 oa << l;
 }}}
 The implementation of serialization for `std::vector<>` calls serialize on
 every element of `l`, thus also on `d`. Next, we serialize `pd`.
 {{{
 oa << pd;
 }}}
 According to
 [[http://www.boost.org/doc/libs/1_57_0/libs/serialization/doc/serialization.html#pointeroperators]],
 the serialization code checks whether an object of type `dummy` at address
 `&d` has already been serialized. Since `d` was already serialized as an
 element of `l`, this is indeed the case. Thus, we only store some special
 tag for `pd`, no actual object information.

 Next, we create some input archive `ia` which reads the same file as `oa`
 wrote to. We first deserialize the `l` from above, which we now call `l_`:
 {{{
 std::vector<dummy> l_;
 ia >> l_;
 }}}
 This creates a new object of type `dummy` at some arbitrary address which
 we can get through `&l_.back()`. We define `d_` to be this object
 identified by the combination of type and address. When deserializing the
 `pd` from above into `pd_`,
 {{{
 dummy* pd_;
 ia >> pd_;
 }}}
 we encounter the special tag that we wrote to the archive instead of the
 proper object. By the same section of the documentation as mentioned
 above, this should allow the serialization library to recognize that it
 does not need to create a new object but rather let `pd_` refer to `d_`.
 We check this through the following assert:
 {{{
 assert(pd_ = &l_.back());
 }}}
 As mentioned, for a single level of `std::vector`, this indeed works.

 The key point is that `oa << l;` and `oa << pd` try to serialize an object
 of the same type at the same address. According to the documentation on
 serializing pointers, the library detects this situation and stores only
 one object. But exactly the same situations occurs for multilevel
 containers! From the user's perspective, there is thus no reason to expect
 this situation not to work.

 ----

 In the remainder, I'll try to explain why in fact it does not work with
 the current implementation. The problem is on lines 61 to 67 in
 collections_load_imp.hpp, and the line numbers below refer to this
 section. Assume we serialized by running the following code
 {{{
 std::vector<std::vector<dummy>> l(1);
 oa << l;
 }}}
 and are now about to deserialize this object. The lines
 {{{
 std::vector<std::vector<dummy>> l_;
 ia >> l_;
 }}}
 cause the following to happen. We read that `l` contained a single
 element. We therefore create a temporary (line 62), which we call `tll_`,
 and deserialize this single element into it (line 64). Now `tll_` is a
 `std::vector<dummy>` of length 1. Next, we call `l_.push_back(tll_)` (line
 65). Since we would like future pointers to this "logical" object to point
 to `&l_.back()` and not `&tll_`, we tell the new address to the archive by
 calling `reset_pointer_address()`. At this point, the error happens: We
 would also have to tell the archive that we want future pointers to
 reference `&l_.back().back()` and not `&tll_.back()`. We cannot do this,
 however, because at this point in the code we don't know that `tll_`
 (which corresponds to the variable `s` in the actual code) is in fact a
 vector. In conclusion, I thus know where the bug is (and I am pretty sure
 it is in fact a bug), but I don't know how to solve it.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/10740#comment:5>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:17 UTC