Boost logo

Boost :

From: Maarten Hilferink (mhilferink_at_[hidden])
Date: 2003-05-22 05:51:03


Dear Boosters,

(questions regarding contributing to boost at the end of this mail)

I have made a:

template<typename T> struct sequence_array<T>;

which has an interface that is almost compatible with

std::vector<std::vector<T> >;

but faster.

Rationale:

I often worked with vectors of sequences, two examples:

1. typedef std::pair<double, double> dpoint;
    typedef std::vector<dpoint> dpolygon; // point sequence
    typedef std::vector<dpolygon> dpolygon_array; // point sequence array

2. typedef std::basic_string<char> String; // char sequence
    typedef std::vector<String> StringArray; // char sequence array

When working with vector-based sequence arrays (as above),
I found that the implementations are not efficient
due to the many allocations/deallocations of the individual sequences.
Especially the destruction of such vector based sequence arrays
often takes unneccesary and annoying amount of time.

Since the lenght of the individual sequences is often higly diverse,
an implementation based on two dimensional
arrays with fixed row lengths was not an option.

Therefore I have implemented:

    template<typename T> struct sequence_array<T>;

that uses an internal
    std::vector<T> m_Data;
for sequencial storage of sequences of T
and an additional
    std::vector<std::pair<T*, T*> > m_Seqs;
for identifying allocated sequences.

sequence_array offers typedef iterator, const_iterator, reference, and
const_reference
with interfaces nearly compatibe with
std::vector<T>*, const std::vector<T>*, vector<T>&, resp. const vector<T>&.
(the sequences however don't have a reserve() member function,
only a resize() resulting in reallocation in m_Data if they grow
AND [are not the last sequence in m_Data
or insufficent reserved space after m_Data.end() is available] )

he storage m_Data grows with the familiar doubling strategy
as sequences are inserted or re-allocated due to calls to their resize() mf.
m_Seqs is adjusted when m_Data has grown by re-allocation.

Destruction of a sequence_array only requires two calls to delete[].
(assuming that T::~T() doesn't do additional things).

sequence_array minimizes the pollution of the heap with varying sized
arrays.

Users benefit most when the total size and number of the sequences is known
in advance,
and provided to a sequence_array by the member-functions reserve() and
data_reserve().

For the discussion, I have made my pre-boost facilities library available
at:
http://www.objectvision.nl/dms/downloads/rtc-src-2003-05-21.zip
It contains the file GeoSequence.h, which contains the
template <typename Field> struct SequenceArray;

At this point, my implementation is not ready for inclusion in boost for the
following reasons:
- it is dependent on other headers in my pre-boost facilities library
- it is not yet in boost style
- lack of required peer review in order to get sufficient quality and
general purpose usability.
(however, it does work fine after having solved some problems in our
product).
- open design issues

I specifically would like to hear from you on the following design issues:
- I am considering to change the definition of the above mentioned m_Seqs
to:
    std::vector<std::pair<size_t, size_t> > m_Seqs;
where m_Seqs[i].first and second are offsets of the start and end of
sequence i relative to m_Data.begin().
This avoids adjustment when m_Data grows, for a small const access-time
penalty.
- At this point, the destruction of abandoned elements in m_Data
 (= elements not being part anymore of a reallocated sequence),
 is deferrred to the destruction or re-allocation of m_Data.
- the last sequence in m_Data now gets special treatment when it is resized
and sufficient reserved
space is available after m_Data.end(). This special treatment saves an order
of complexity
when the last sequence often grows, but adds a constant cost to the
re-allocation administration.
- generalization to sequece_array arrays (3 levels deep) or higher
dimensions.
(this would require more intelligent template definitions than I have now)

I am willing to spend some time to make sequence_array boost ready,
but only if it is considered useful.

I therefore have the following questions:

1. Arye you interested in such contribution?
2. Is a similar construct already in boost or is it being considered?
3. Do you know of other solutions for efficient management of
sequence_arrays?
4. Is my assumtion true that the above described behaviour could not have
been created
by using a smart allocator as a second template argument to std::vector?
(I assumed this since allocators cannot have a vector instance specific
state)
5. any design suggestions?
6. could somebody with more experience with contributing to boost mentor me
with making a sequence_array contribution ready?

Regards,
Maarten Hilferink
mhilferink_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk