Subject: Re: [boost] [convert] Performance
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2014-06-12 05:04:44
Sitmo Consultancy B.V.
Financial Modelling & Data Science
+ 31 6 24110061
P.O. Box 1059, 2600BB, Delft, The Netherlands
On 12 Jun 2014, at 10:41, Joel de Guzman <djowel_at_[hidden]> wrote:
> On 6/12/14, 2:45 PM, Thijs (M.A.) van den Berg wrote:
>> On Jun 12, 2014, at 2:30 AM, Joel de Guzman <djowel_at_[hidden]> wrote:
>>> I do not think a random distribution of number of digits is a
>>> good representation of what's happening in the real world. In
>>> the real world, especially with human generated numbers(*), shorter
>>> strings are of course more common.
>> A well known real world property is Benford's law, often used in fraud detection to check is numbers are fake or "natural".
>> If you draw random numbers uniformly from the logarithmic scale then you'll get that scale invariant property. I think that leads to a random number of digits?
> That one is for the first digit only and not for the number of digits.
> Is it just a conjecture that single digits, for example, occur more
> frequently than say 1,000,000 digits? If that conjecture does not hold,
> then we should probably be using big nums all over! It's also a known
> *fact* that varint encoding gives the best performance compared to
> uniform encoding when transferring data over networks!
> I'm not sure if there's a study of the probability of the occurrence of
> N digits, is there? Anyway, here's one:
> Perhaps the math guys should set me straight and I would not be surprised
> if the answer is 42 again! :-)
Youre right. There are indeed many distributions that give rise to Benfords law. Maybe someone should write a script that scrapes all the numbers in boost source files.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk