|
Boost : |
From: Fernando Cacciola (fcacciola_at_[hidden])
Date: 2003-07-01 21:09:16
>Joaquín Mª López Muñoz" <joaquin_at_[hidden]> escribió en el mensaje news:3F01C3F6.520AA2B4_at_tid.es...
[...]
>>C++ already has the concept of an Associative Container, and this
>> concept includes that of a 'Key'. In std terms, a 'key' can be
>> integrated with the value type (ala 'set') or not (ala 'map');
>> so this particularity shouldn't indicate the need for another term.
>
>multiindex_set follows the à la set way. The whole element is the
>key. For instance:
>
>struct element{int a;int b;int c;};
>
>struct comp_element{
> bool operator()(const element& e1,const element& e2)const
> {
> return e1.a+e1.b+e2.c<e2.a+e2.b+e2.c;
> }
>};
>
>typedef multiindex_set<element,comp_element> mx_set;
>
>Here you can see that the comparison predicate reads from all the
>members of element, so basically multiindex_set makes no assumption
>(it cannot) about which parts of the element are keys or not. This is
>useful for constructing "composite" indices, following database terminology.
>less_by is used for comparison predicates depending upon one single
>member, but this is not the general case.
>
I understand the current design but I'm not sure if it defines the concepts
correctly, which is IMO the first important thing to get right.
So let's see:
>From a C++ user perspective, your class is an Associative Container.
An associative container associates contained elements with a key.
The 'key' _identifies_ the element by means of some 'relation' (or association).
Since the key is the element identifier, keys are used to look up elements.
In a std 'set', the relation which associates keys and elements is the
'identity'. It means that to identify (look up) an element of type 'E'
with value 'e', the user needs to use the value 'e' itself, of the same type,
as a key.
In a std 'map', the relation is the explicit bijection expressed by a
'pair<Key,Data>. It means that to identify an element of type 'Data'
whith value 'd', the user needs to use a value 'k' of type 'Key' as the key.
Hence, the 'key' of an associative container is the (typed) value which is
used to identify elements in the container.
In your class, the elements look like keys simply because you choose
to model the predicates as black-boxes which extract keys in some
indeterminate way.
That is, an _entire_ element value does not identify a contained element.
It is a 'part' of the element value which acts as a key, but not the element
(not the whole of it).
It is not really necessary to melt the concepts of key and value using
a black-box predicate over an entire element. You can use separate
KeyExtractor and KeyCompare.
This would better reflect the fact that your container stores values
(elements) along with mutiple keys, which by default just happen
to be different parts of the stored values.
>So, every index has the same implicit key (the whole element), so calling
>the indices "keys" seems to me confusing. An index is just a sorting
>criterium. I'm not sure I've explained myself clearly enough (if I haven't
>please tell me so).
>
As you said, part of the naming problem arises from the fact that
the elements are arranged (are are shown arranged) within subsequences
according to theirs keys. You call each subsequence an 'index', so in a way,
your DS is certainly a 'multiindex'.
But there already a word for a group of elements sharing something in
common such as a key: A 'cluster'.
I would say that your class is a multikey associative container
which arranges elements in clusters according to different keys.
Thus, I'd call each 'index' a 'cluster', such that each cluster has
the invariant that all its elements share the same key (or group of keys)
>Now the question of how to name the indices
>remain open, though maybe this discussion has made you lean towards
>my preferences. This is by no means a critical aspect, if someone comes
>with a more appropriate name I'll happily adopt it.
>
Yes , the name is not so much important, but the underlying concepts are;
and the other boosters, if ever get interested (hint hint), will focus on these.
>From the discussion above, I hope it is clear now that
>multiindex_set regards the entire element as the key.
>
Actually, the key is _derived_ from the entire element,
but it is not the entire element itself. That is, the _entire_
element is not used as identifier, just parts of it.
I would stress this fact separating the key extraction
from the key comparison.
>A map variant can be built on top of it, but it is not clear to me
> whether such a construct is useful enough.
>
I have used a multikey associative container to build a hierarchical
priority queue were the priority sequence is detached from
the element. But as you say, with your design (or with a separate
Extractor/Compare), a map can be built on top of this already,
so we might not need an explicit map.
>Hashing of user-specified indices is a very good option to have.
> I'm currently exploring it, but given the little amount of time I
> can devote to this I'd prefer to leave this out for the moment.
Great.
A hashing associative container might be better expressed in
terms of key equality (since ordering is not needed), so this
would need a different interface, but I agree that having
the ordered vesion first is a priority.
[...]
>> An alternative would be something like this:
>>
>> class multiindex {
>> ...
>> template<class Modifier>
>> Modifier update ( const_iterator it, Modifier Mod )
>> {
>> // iterator is private, so is access()
>> iterator mit = access(it);
>> Mod(*it);
>> update(*it);
>> }
>> ...
>> } ;
>>
>> Modifier is a user-suplied function object
>> which signature of the form:
>>
>> void operator() ( element& e ) ;
>
>Ummm... I'm afraid this cannot be done. What happens if the updated
>element does not fit into the multiindex_set due to collision with some
>other element? For instance:
>
>multiindex_set<int> ms;
>ms.insert(0);
>ms.insert(1);
>ms.update(ms.begin(),increment_by_one);
>
>The last call will call increment_by_one() on the first element with
>value 0, changing it to 1. When reordering of the element is tried,
>a collision with the second element will ensue. What's worse, one
>cannot revert to the original situation, because the element has been
>already rewritten.
>
I see. You are correct.
The original update is transactional, which protects the integrity.
It is pitty that we can't think of a way to avoid the double-duplication,
but at least it works now as it is.
>At least, I hope current update() is no longer a showstopper for you.
>
It isn't anymore :-)
>> One last concern: keys are related hierarchycally?
>>
>Only for implementation purposes: derivation from one key
>(or index) to another is protected. From the user's point of
>view, keys/indices are structurally unrelated types obtained via
>get() from the source multiindex_set object.
>
I see.
Depending on the application, this independece might
not be a good property.
How do I 'cluster' (that is, get all of) elements which
share some primary key and some secondary key, and so on?
BTW: I figure that there is some clustering and ordering invariant.
That is, elements sharing some key-related property are placed
together (clustered), and clusters are ordered in some way.
But I can't tell how is this exactly and which operations maintain
these invariants.
For example: can I expect a specific arrangement with a lineal
traversal (as with set/map), or only via 'get()'?
> I'd greatly appreciate your interest in examining the library, and
>realize the difficulty in doing it without any documentation (which
>should follow promptly if finally multiindex_set catches boosters'
>eyes).
You're welcome. I greatly appreciate your doing it!
Best,
Fernando Cacciola
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk