|
Boost : |
From: Matthew Austern (austern_at_[hidden])
Date: 2001-01-11 15:40:07
As many of you know, I wrote the hash tables and singly
linked lists that are part of the SGI STL.
Now that I've moved to AT&T, one of the projects we're
working on is a new implementation of the standard library,
based in part on the SGI version. Partly this is just a
matter of cleanup; partly it's a matter of revisiting some
questionable design decisions. For example, we now
cleanly segregating extensions into different headers and
a different namespace. We also making sure that the
extensions are usable with any standard library implementation,
that they don't depend on any special internal features of
one particular standard library.
I expect that we will be able to release this complete library
fairly soon. It's not clear to me what, if anything, it will
have to do with BOOST. Certainly it's outside the scope of
what BOOST has been doing.
On the other hand, it should (by design) be straightforward to
split off hash tables and singly linked lists from the rest
of the library; if there's interest, we could submit them to
BOOST separately.
There's been a little discussion about design decisions for
hash tables, but none for singly linked lists. I'd like to
start that discussion now.
A singly linked list class has fewer features than a doubly
linked list class. The whole point of slist is that you're
giving up features in exchange for space savings. To some
extent, most slist design tradeoffs revolve around a single
question: are willing to incur any space overhead (perhaps
on a per-list basis, if not per-node) to get any of those
features back?
First question: is size() O(1), or O(N)? The only way for
it to be O(1) is if you keep an element count in the list.
I'm opposed to keeping such a count, because users who care
about it can keep a count themselves; I don't want to spend
that extra word of storage.
A more interesting question: what's the complexity of
L.insert(i, x)? (Semantics: insert x before i.)
The problem is that singly linked lists work more naturally
with insertion after an element than insertion before. The
links just go one way; you can always find the next element,
but finding the previous element requires going back to the
beginning and stepping until you're back in the right place.
One option is to provide an insert_after member function.
Then you could either let insert be O(N) (becuase of the
time required to find the right position), or else not have
insert at all. That's what I chose. I did provide insert,
so that we could keep the Sequence interface, but slist
users should in general use insert_after instead. If you
need to step backward a lot, then singly linked lists are
not for you.
(A minor detail: since singly linked lists deal with "after"
so much, it makes sense to provide a nondereferencable
before-the-beginning iterator as well as a nondereferenceable
past-the-end iterator. I provide a before_begin() member
function.)
There's another defensible option, though. Internally we
still have something like insert-after semantics, but
externally it looks like ordinary insert-before. The way
this works is that, internally, an slist iterator that
refers to an element will point to the *previous* list node.
You'll be changing iterator dereference from something like
*(p->val) to something like *(p->next->val).
The advantage of this is obvious: users get reasonable
insertion performance, without having to switch from
insert() to insert_after(). One disadvantage is equally
obvious: iterator dereference becomes slower. (Two pointer
dereferences instead of one.)
There's a less obvious disadvantage of having iterators that
point to the previous node: dealing with L.end() becomes
harder. The problem is that end() must find a pointer to
the last node in the list. (As opposed to just wrapping a
null pointer.) Clearly you don't want and O(N) end() that
steps through the list, so you're left with just one option:
having the list maintain a pointer to the first node, and also
a pointer to the last node. That's what I meant by a tradeoff
between features and space: in exchange for insert-after
semantics, you're paying one word of overhead.
Does one word per list matter? Sometimes yes, in my opinion.
For example, I'm using a vector<slist> for my hashtables. I'd
hate to spend an extra word for every bucket.
So while having that end pointer, is, in my opinion, a reasonable
design, it's not the design that I prefer. Again, if there's
a singly-linked list that doesn't have that end pointer, it's
easy for users to keep track of the last node themselves if they
find that they need it. I did not want to have any overhead
beyond what you'd get if you wrote your own singly linked list in
C.
This is something that lisp people, who do a lot of work with singly
linked lists, are used to: there's a data structure called tconc,
which consists of a list plus a pointer to the last cons cell.
(Actually, while writing this, I realized that, since I'm claiming
that it's easy to build a slist-plus-end-pointer on top of an
slist, I should provide a little tconc wrapper class that keeps
track of the last element, and that performs that iterator remapping
that I've talked about. Users who want zero overhead can use raw
slist, users who want convenient insert-before can use the wrapper.
I should probably give the wrapper a better name that tconc, though.
Suggestions?)
--Matt
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk