Boost logo

Boost :

From: dheld_at_[hidden]
Date: 2001-06-03 02:18:40


--- In boost_at_y..., Darin Adler <darin_at_b...> wrote:
>
> I have used a similar data structure in the past. A sort of "set
> with indexing".
>
> A specific place I used this was when flattening a set of objects.
>
[snip]
>
> Rolling the vector and map into a single data object could provide
> the opportunity for additional optimization.
>
> Perhaps this indexed map is for similar situations.

Yes, that would be a fine use of the map.

> On Saturday, June 2, 2001, at 04:12 PM, David Abrahams wrote:
>
> > ...but map<index, value> already has those properties. Doesn't
> > it????

However, let me motivate my need for this structure. In Win32, you
can create ListViews (which are simply a nice, multi-column listbox
with some additional features). Normally, the ListView itself will
hold the data that it displays. However, A) you have no guarantee
that it does so efficiently, and B) adding many items takes a long
time (I suspect because the data is stored in a vector-type data
structure). So Bill added a mode that allows you to create "virtual
ListViews". These beasts allow you to store the displayed data
yourself, and the ListView just asks for it when it needs it.

Now, I like to display data from a database using this control. This
data has a key and data fields. Sounds perfect for a map, right?
Well, that would be fine and dandy, except that the virtual ListView
asks for data based on it's rank-order, and not its key-order. The
ListView doesn't know anything about your keys, so it just says: "I
want to display item 2048 through 2056" (actually, it asks one item
at a time, but this is essentially what happens). How do you get to
item 2048? The naive way is to get an iterator, and increment it
2048 times. But just getting the successor in a red-black tree
(which seemes to be a very popular implementation for std::map) can
be an O(log n) operation, and rather commonly (though you can get O
(1) quite often as well). What this means is that getting the ith
item out of a map is an O(n log n) operation! That's rather
expensive, especially considering that it can be requested tens or
hundreds of times per minute with just "normal" use of the ListView.
For a ListView with thousands of items, that performance is most
certainly unacceptable.

An alternative is to use an associative vector. There you get O(1)
indexing, which is very nice. However, if want O(log n) associative
(key based) lookups, you need to maintain the vector in sorted order,
which makes inserts and deletes O(n) (again, quite expensive for
large data). You could leave the data unsorted, and use a partial
binary search, but this soon gets complicated, and ultimately
requires a sort (which will most likely be O(n log n)...we're getting
expensive again).

The O(1) indexing isn't really needed for adequate performance with
the virtual ListView (my tests show that with a data set of 1,000
items, you can perform 1,000,000 O(log n) indexed map lookups--rank
based, not key based--in less than 2 seconds...that's plenty fast).
However, I was in the habit of creating data structures that used
std::vector combined with some search to perform associative
lookups. With the indexed map, though, I can do something like the
following:

typedef std::imap<std::string, MyData> MyMap;

And I can look things up associatively, assuming the key of my DB
table were a string, or by rank index, to populate my virtual
ListView. However, my keys are usually integers, so I more typically
write:

typedef std::imap<int, MyData> MyMap;
MyMap m;

However, don't be confused. While my database key may be an int, it
is not necessarily 0-based, and it is not necessarily contiguous.
The key values might start with 10,000, for instance, and jump by
fives. But the ListView doesn't know this. When I scroll the
ListView, it might ask me for item 384, which isn't even a valid
key. Here I need the "inorder rank index", and not the "associative
index". So while m[384] might *look* like the right expression, it
is, in fact, incorrect. What I really want is m.at(384) (but I still
want the associative lookups for other things!).

An associative vector allows me to do this in good time, but the
complexity charateristics of the other common operations (i.e. insert
and delete) is undesirable. A std::map allows me to do this as well,
but the complexity cost of iteration is exorbitant. For this
application (and many other similar ones, I suspect), the indexed map
is a respectable tradeoff between insert, delete, associative lookup
and indexed lookup (all O(log n)).

Dave

P.S. If a few more people think this container is useful, I'll
upload what I have so far to the vault, and invite suggestions for
how best to excise it from STLport.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk