Boost logo

Boost :

From: Joaquin M Lopez Munoz (joaquin_at_[hidden])
Date: 2006-04-20 18:30:09


>

Hello Gennadiy, I'm returning to this discussion. Excuse
the big delay.

By the end of your post, you wrote:

> I am not really crazy about the term Bookmark. IMO Order/Index is much
> better reflect the nature of the concepts we are trying to model. But above
> is step in right direction IMO.

Well, let me use Index/Bookmark if only during this
discussion so as to not introduce further confusion,
since I think you agree with me it is concepts we're
most interested now, rather than words.
I'm not in love with "bookmark" either, it is just a
hack word I came by with some resonance to the
concept it is being used to refer to here.

Now, returning to the original flow of your post:

Gennadiy Rozental <gennadiy.rozental <at> thomson.com> writes:
> That's the primary reason I would need an Index. Aa Order may have have
> different complexity guarantees. But in some cases I need partucular search
> to be performed much faster then it is guaranted by order search complexity.
> Essentially I am looking for an ability to do something like this:
>
> address_book_t ab;
> typedef address_book_t::order<2>::type full_name_order;
>
> // this operation is compile time
> full_name_order fno = ab.get_order<FullName>();
>
> // this operation has constant complexity, since Last name Index provide
> direct access
> std::pair<full_name_order::iterator,full_name_order::iterator>
> p = fno.get_index<LastName>::equal_range( "Smith" );
>
> ...
>
> In some cases Index could have non-constant complexity, but still could be
> used to significantly speedup orderred collection search. Lets say I have
> Input messages log ordered sequencially (meaning that the order is just
> natuaral order messages where comming in). And I have int64 number of theses
> comming in every day. Now for this order I want to define Time index to
> speedup lookup for messages in partucilar time interval. This operation most
> probably wont be constant (due to continues nature of time) but still could
> be implemented much faster then global serach through messages collection.
> In addition for the same collection I may want to define Symbol (some
> attribute in a messages) order. And for that order I need first letter
> index.
> My primary point in above examples is that both Order and Index are first
> class citizents and current desing doesn't account for the latter.
[...]
> > > IMO we need another level of indirection that would support Index
> > > within an
> > > Order concept. I do not see how existing design could handle this.

Well, if we concentrate now on the kind of data structures
we'd need in order to achieve what you request, I come up with the
following conclusions:

1. If the index to be sped up is of the ordered kind, then
the only way to have faster lookups for clusters of elements
(which, according to my terminology, are groups of elements
equivalent under some sorting criterion compatible to that
used by the index, see http://tinyurl.com/zm6h3), we'd probably
have to resort to hashing (which improves rb-tree O(log n)
lookup) and have some inner structures like

  hash_map<key_type,oredered_index_type::iterator>

storing pointers to the first element of each cluster. This
has to be properly synchronized with the ordered index on
insertion/deletion.

2. If the index to be sped up is of the sequenced kind, we've
got a problem: you require that the bookmark refer
to clusters by using iterators from the index itself, as
implied in your example:

  std::pair<full_name_order::iterator,full_name_order::iterator>
   p = fno.get_index<LastName>::equal_range( "Smith" );

and explicitly stated by you in a previous post:

  "All indexes for the same order present the same
  iterator which is a view iterator."

This is AFAICS impossible if the index is sequenced,
since the elements can be freely rearranged and thus
a cluster need not be adjacent (from the point of view
of the sequenced index iterator). If instead of
returning sequenced index iterators we allow the bookmark
to use its own iterators, which could then properly
traverse clusters in the desired order, we have nothing
more nor less than an additional B.MI index!! You
can have it ordered or hashed according to your taste.
So, my conclusion in this case is that all bookmarking
needs for sequenced indices are fully covered by
additional indices. This argument applies unchanged to
random access indices as well.

3. If we want to bookmark a hashed index, I really don't
know how to have faster lookup of clusters than
the hashed index itself already provides. As hashed
indices do not show any definite traversal order,
clustering can only be achieved by an additional hashed
index approppiately defined. My hunch is that your request
is primarily targeted at ordered indices, anyway.

So, I conclude that bookmarking ("indexing" in
your terminology) is either already covered by B.MI
or irrelevant except in case #1, that of ordered
indices ("orders" in your terminology), which can be sped
up by using an internal hash table of pointers to the
clusters's first elements.

I'd love to know whether you agree with this analysis or,
on the contrary, I'm missing your point. As for
bookmarking ordered indices, I find the idea of
some value for massive amounts of data, though I'm
by no means certain that something like that should go
into B.MI. But certainly I can write some prototype
of the idea and let you play with it, in case you're
interested.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk