Boost logo

Boost :

From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2003-07-10 18:21:10


Joaquín Mª López Muñoz wrote:
> The semantics of std::map (or any other STL associative container) are
> not overwriting on insertion: instead, if an equivalent element is found, no
> insertion occurs. This is what my library does, too. Maybe you're confusing
> here map::operator[] with map::insert.

Yes, I was confused. Your semantics are correct.

>>There are various advantages to separating key extraction from key
>>comparison:
>>
>>1) It allows the use of more generic key extraction and key comparison
>>functors. Specifically, for an ordered index, the key comparison
>>function in many cases would be std::less<KeyType>, and can default to
>>that. For "unordered" (hash-table-like) indices, it is necessary, at
>>least internally, to separate key comparison and key extraction because
>>the hash function must also operate on the "key" value; otherwise, the
>>hash function must also be very specialized, when in many cases a
>>default library-provided hash function can be used (and thus not specified).
>
>
> A small utility class named less_by is used for precisely this purpose. You can
> see it in action in the example accompanying the library. As for hashing, an
> analogous device can be written (when hashes get included).
>
>
>>
>>2) It is necessary in order to support certain convenient interfaces;
>>specifically, I think it is very convenient to allow the syntax:
>>table[key], find(key), and equal_range(key). In order for the front-end
>>to deteremine which index corresponds to the key type, it must "know"
>>about keys.
>
>
> find(key) and related search memfuns are already supported in the way
> you suggest! Look for "special lookup operations" in the documentation.
> Matter of fact, the operations provided are even more powerful in that
> they allow for alternative arguments and comparison predicates.

It does not appear, however, that they support automatically detecting
the index based on the type of the key. This functionality seems
useful, although in general it is only a syntactic convenience..
Specifically, if a table has two indices, one with a key type of int,
the other with key type of std::string, with this functionality, the
following syntax could be used: (note that the operator [] I show here
could not have the semantics of std::map; rather it would need to throw
an exception or have undefined behavior if an existing element is not found)

table["somekey"].whatever;

The above statement would use the std::string index, because the type of
the argument (char const *) is convertible to std::string.

table[3].whatever

The above statement would use the int index, because the type of the
argument is int.

A similar syntax could be used for find, equal_range, count, erase.
Additionally, something like the following could be supported, where
table_type is the type of the table:

table_type::index_with_key<std::string>::iterator it =
table.find("something");

Note that you provide a version of equal_range that is not provided by
std::map or set, namely, a two-argument version, with the semantics of
returning the range between the two arguments. It is not possible to
implement this for a hash table index. Under a policy-based interface,
this could be supported by the "view" of an index that is a binary
search tree, while not being supported by the table class itself.

Similarly, finding using a weaker ordering predicate is only supported
by binary-search-tree-based indices, and thus it seems this should be
supported on a per-index basis only.

Even if it is decided that automatically detecting the index based on
the key type is not useful, I think it is advantageous to decouple key
extraction from key comparison.

>>Couldn't a `composite index' be supported by a key which is a tuple of
>>some number of references?
>
>
> I guess you're taking inspiration here from your own library's design. This
> implementation of composite indices simply doesn't fit here. You can consider
> things are "the other way around" with respect to indices and dependent keys:
> every index is composite. Those that happen to depend on a single member
> of the element can be conveniently built by resorting to less_by. I'm
> not sure I'm being clear enough. If not, please tell me so.
>
> On a side note, taking composite indices as indices over a tuple of
> members does not cover the full spectrum of what an index can do. Imagine an
> index whose associated comparison predicate depend on the result of an
> external function, say
>
> int f(const element& e);
>
> f() won't accept a tuple of members of element, but the actual element. To make
> it fit one would be forced to construct a new element on the fly out of the
> provided parts.

Although you have not specified the semantics of f(), it seems that f()
is effectively a key extractor, and the key type would be int. In order
to use this with your current system of integrating key extractor with
key comparison, an additional functor would be needed that effectively
composed this function f() with some comparison predicate. In order to
allow the user to use an int value (rather than an element value) as a
key, this composition functor would also have to provide a number of
overloads some of which bypass this key extraction functor for one of
the arguments. For a hash-table-based index, it becomes even more
complicated, because a second "composition" functor must be provided
that deals with hash codes.

In constrast, if key extractors are supported by the table class itself,
in most cases only the function f must be specified, and default key
comparison and hash functions (if it is a hash-table-based index) can be
used. Decoupling the key extractor from key comparison (and hash
functions) simply seems more logical, because I think in the far
majority of cases they are decoupled.

It is hard for me to imagine a case where there is no underlying key
type _but_ the comparison predicate of the index supports comparison
between the element type and some other type.

> There's a difference. With update(), if a collision happens the changed
> element is kept with its original value. With modify(), it is erased.
> BTW I like the names "update" and "modify" in that they suggest the
> way the different memfuns work: Somehow, "modify" seems more
> intrusive, which precisely modify() is. What do you think about it?

Okay, this seems reasonable.

---
Jeremy Maitin-Shepard

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk