Boost logo

Boost Users :

From: bert hubert (bert.hubert_at_[hidden])
Date: 2006-03-13 14:32:05


On Mon, Mar 13, 2006 at 05:22:35PM +0100, Joaqu?n M? L?pez Mu?oz wrote:

> I'd check the following for anomalies:

Joaquin, I'm busy tonight with other things so I'm only able to give you
some approximate answers, for which I apologise. I hope to be able to give
more definitive answers Wednesday.

> 1. Could you (just for tests), replace index #0, which now is hashed,
> with an ordered_unique index and rerun? A quadratic behavior could
> be the result of a very bad hashing scheme: this is extremely unlikely
> for your current code, but who knows.

The original container had ordered_unique as its first index and showed a
similar (large) increase in pruning time when pruning more cache entries. I
did not look for a power law or anything.

> 2. Is the size of CacheEntry::d_records roughly the same between
> prunes? Could it be that the average number of records in CacheEntry
> elements is disproportionately larger on those prunes that take
> longer?

The cache remains almost identical in size, around 50 million entries. The
cache represents DNS records expiring, each run expires all DNS entries that
have a 'time to die' ttd that is larger than the current time.

This holds, I think, a vital clue to the problem at hand. Records are
inserted with ever increasing ttds, generally. There is a number of common
time-to-live numbers for DNS records, ie, 5, 60, 300, 3600, 7200, 86400
seconds.

This means that we have a number of 'striding' ttd's being inserted: now +
5, now + 60, now +300 .. now +86400. Might this trigger a badly balanced
tree?

On a related note, I wonder if chopping off the lower half of any sort of
ordered tree might trigger worst case behaviour. Note that
CacheEntry::getTTD() which forms one index might be slow to call.

> 3. What percentage of elements of d_cache are visited on each
> prune? I.e. what is the typical value of looked/d_cache.size()? The
> nonvisited elements shoudn't have any impact on phase /*2*/,
> but they have a (minimal) impact on those operations of /*1*/
> performing as O(n log n), such as replace(). Anyway, no amount
> on nonvisited elements can have such a huge impact on figures.

Typically 500 elements are visited of a 50 million element structure, so
1e-5. Things get crazy once > 1000 elements are visited.

Thanks Joaquin, I hope to provide more measurements soon.

Regards,

bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net