|
Boost : |
From: Matthew Hurd (matthurd_at_[hidden])
Date: 2001-01-11 16:11:33
I think it is not just a matter of "one word" for a count of items. As you
suggest, one part is the scaling of the composability of such things, such
as containment within other containers. One million slist's cost an extra
one million words.
What I am really thinking is that I would like to see better composition
with respect to aggregates. Why is the sum of the number of items so
special? What about a maximum and minimum? Sorted containers could have a
composition returning a median?
It is not just about aggregates, but also algorithms that are efficient with
respect to incremental analytics. Many things have a finite difference
representation. I regularly use incremental analytics for many math
procedures.
A second issue for me is that in my work I often compose containers. For
example, a container with a hash and a tree for fast instant retrieval and
order traversal without a sort. It would be nice if there were standard
adapters to compose such beasts. Maybe there is already a simple solution
for which I am just ignorant...
Sorry to broaden the debate, this is probably a bit off topic to slist and
hash tables but it is a little relevant to the scalability of the use of
many different containers.
Just a couple of senseless cents,
matt.
(a different, lesser known one)
> -----Original Message-----
> From: Matthew Austern [mailto:austern_at_[hidden]]
> Sent: Friday, 12 January 2001 7:40
> To: boost_at_[hidden]
> Subject: [boost] Hash tables and singly linked lists
>
>
> As many of you know, I wrote the hash tables and singly
> linked lists that are part of the SGI STL.
>
> Now that I've moved to AT&T, one of the projects we're
> working on is a new implementation of the standard library,
> based in part on the SGI version. Partly this is just a
> matter of cleanup; partly it's a matter of revisiting some
> questionable design decisions. For example, we now
> cleanly segregating extensions into different headers and
> a different namespace. We also making sure that the
> extensions are usable with any standard library implementation,
> that they don't depend on any special internal features of
> one particular standard library.
>
> I expect that we will be able to release this complete library
> fairly soon. It's not clear to me what, if anything, it will
> have to do with BOOST. Certainly it's outside the scope of
> what BOOST has been doing.
>
> On the other hand, it should (by design) be straightforward to
> split off hash tables and singly linked lists from the rest
> of the library; if there's interest, we could submit them to
> BOOST separately.
>
> There's been a little discussion about design decisions for
> hash tables, but none for singly linked lists. I'd like to
> start that discussion now.
>
> A singly linked list class has fewer features than a doubly
> linked list class. The whole point of slist is that you're
> giving up features in exchange for space savings. To some
> extent, most slist design tradeoffs revolve around a single
> question: are willing to incur any space overhead (perhaps
> on a per-list basis, if not per-node) to get any of those
> features back?
>
> First question: is size() O(1), or O(N)? The only way for
> it to be O(1) is if you keep an element count in the list.
> I'm opposed to keeping such a count, because users who care
> about it can keep a count themselves; I don't want to spend
> that extra word of storage.
>
> A more interesting question: what's the complexity of
> L.insert(i, x)? (Semantics: insert x before i.)
>
> The problem is that singly linked lists work more naturally
> with insertion after an element than insertion before. The
> links just go one way; you can always find the next element,
> but finding the previous element requires going back to the
> beginning and stepping until you're back in the right place.
>
> One option is to provide an insert_after member function.
> Then you could either let insert be O(N) (becuase of the
> time required to find the right position), or else not have
> insert at all. That's what I chose. I did provide insert,
> so that we could keep the Sequence interface, but slist
> users should in general use insert_after instead. If you
> need to step backward a lot, then singly linked lists are
> not for you.
>
> (A minor detail: since singly linked lists deal with "after"
> so much, it makes sense to provide a nondereferencable
> before-the-beginning iterator as well as a nondereferenceable
> past-the-end iterator. I provide a before_begin() member
> function.)
>
> There's another defensible option, though. Internally we
> still have something like insert-after semantics, but
> externally it looks like ordinary insert-before. The way
> this works is that, internally, an slist iterator that
> refers to an element will point to the *previous* list node.
> You'll be changing iterator dereference from something like
> *(p->val) to something like *(p->next->val).
>
> The advantage of this is obvious: users get reasonable
> insertion performance, without having to switch from
> insert() to insert_after(). One disadvantage is equally
> obvious: iterator dereference becomes slower. (Two pointer
> dereferences instead of one.)
>
> There's a less obvious disadvantage of having iterators that
> point to the previous node: dealing with L.end() becomes
> harder. The problem is that end() must find a pointer to
> the last node in the list. (As opposed to just wrapping a
> null pointer.) Clearly you don't want and O(N) end() that
> steps through the list, so you're left with just one option:
> having the list maintain a pointer to the first node, and also
> a pointer to the last node. That's what I meant by a tradeoff
> between features and space: in exchange for insert-after
> semantics, you're paying one word of overhead.
>
> Does one word per list matter? Sometimes yes, in my opinion.
> For example, I'm using a vector<slist> for my hashtables. I'd
> hate to spend an extra word for every bucket.
>
> So while having that end pointer, is, in my opinion, a reasonable
> design, it's not the design that I prefer. Again, if there's
> a singly-linked list that doesn't have that end pointer, it's
> easy for users to keep track of the last node themselves if they
> find that they need it. I did not want to have any overhead
> beyond what you'd get if you wrote your own singly linked list in
> C.
>
> This is something that lisp people, who do a lot of work with singly
> linked lists, are used to: there's a data structure called tconc,
> which consists of a list plus a pointer to the last cons cell.
>
> (Actually, while writing this, I realized that, since I'm claiming
> that it's easy to build a slist-plus-end-pointer on top of an
> slist, I should provide a little tconc wrapper class that keeps
> track of the last element, and that performs that iterator remapping
> that I've talked about. Users who want zero overhead can use raw
> slist, users who want convenient insert-before can use the wrapper.
> I should probably give the wrapper a better name that tconc, though.
> Suggestions?)
>
> --Matt
>
>
>
>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk