Boost logo

Boost :

From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2006-12-09 19:33:12


me22 wrote:
> On 12/9/06, Mathias Gaunard <mathias.gaunard_at_[hidden]> wrote:
>> I think you should try to model the STL.
>> Indeed, containers such as std::list provide a constant time size
>> function. ilist should do the same.
>>
> I'm fairly certain that this does not need to be the case.
>
> There was a thread a while back about going though the sources to look
> for uses of list::size exactly because it could be O(n) to ensure that
> there was always an O(1) empty-using function as an alternative to
> size.

I think that Howard Hinnant has a very good paper about this, titled "On
std::list::size":

http://home.twcny.rr.com/hinnant/cpp_extensions/On_list_size.html

He suggests adding a new splice overload taking the number of elements
between two iterators to obtain a O(1) size with lists with O(1) size.
I've added that overload both to the intrusive singly and doubly linked
lists.

Historically, SGI STL had a O(N) size std::list. In Interprocess, since
I took SGI implementation as my base implementation I had O(N) list. The
last version has O(1) size. I think that most users expect list::size()
to be O(1) because they think that otherwise, the operation wouldn't be
there, just like std::list has no operator[](). All other containers
have O(1) size, so is reasonable to expect O(1) size in list.

However, with intrusive containers I think that both possibilities
should be provided. In most cases, if the user uses an intrusive list to
store his objects, he might expect the same behavior than std::list and
Windows users (surely using Dinkumware STL) will expect O(1) size. On
the other hand, if people use list::splice as a basic operation, they
want to obtain O(1) splice. There are more examples of more efficient
operations with O(N) size: for an intrusive list "erase(iterator,
iterator)" O(1) if we have O(N) size.

There is another reason to have a O(N) size() intrusive list: space
overhead. If a boost user wants to implement a TR1 unordered_map, a
typical implementation is to have bucket array and each bucket is a
singly (maybe doubly) linked list of values. The library author does not
need islist::size() for nothing, but an empty circular singly linked
list can be implemented with just a pointer pointing to itself. Since
the hashmap's load factor is normally 1.0, this means that for every
inserted object, there is a bucket. Adding the size member duplicates
the size of the bucket, and this space overhead is noticeable. If the
programmer has an option to have a O(N) size, he reduces the size of a
bucket to just one pointer. If we have a "unordered_set<int>" with a
load factor = 1.0, and a space-friendly allocator (a node allocator),
there is a ~25% size overhead if we use a O(1) size intrusive singly
linked list: every node has 2 words (1 word for every stored int, 1 word
for the pointer used to implement the linked list) and every the bucket
has two words (one for the pointer to implement the linked list and
another word to store the size of the linked list).

Original Intrusive library containers had O(N) size. Since I think that
most users expect to have O(1) size, I've added that possibility. The
default is to have O(N) size, but that, of course, can be changed, if
people prefer O(1) size.

Thanks for the comments,

Ion


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk