Subject: Re: [boost] [convert] Performance
From: Joel de Guzman (djowel_at_[hidden])
Date: 2014-06-12 05:01:46

On 6/12/14, 4:41 PM, Joel de Guzman wrote:
> On 6/12/14, 2:45 PM, Thijs (M.A.) van den Berg wrote:
>> On Jun 12, 2014, at 2:30 AM, Joel de Guzman <djowel_at_[hidden]> wrote:
>>> I do not think a random distribution of number of digits is a
>>> good representation of what's happening in the real world. In
>>> the real world, especially with human generated numbers(*), shorter
>>> strings are of course more common.
>> A well known real world property is Benford's law, often used in fraud detection to
>> check is numbers are fake or "natural".
>> If you draw random numbers uniformly from the logarithmic scale then you'll get that
>> scale invariant property. I think that leads to a random number of digits?
>> http://en.m.wikipedia.org/wiki/Benford's_law#Mathematical_statement
> That one is for the first digit only and not for the number of digits.
> Is it just a conjecture that single digits, for example, occur more
> frequently than say 1,000,000 digits? If that conjecture does not hold,
> then we should probably be using big nums all over! It's also a known
> *fact* that varint encoding gives the best performance compared to
> uniform encoding when transferring data over networks!
> I'm not sure if there's a study of the probability of the occurrence of
> N digits, is there? Anyway, here's one:
> Perhaps the math guys should set me straight and I would not be surprised
Just for fun: google any single or double digit number (e.g. "1") and
then google a many digit number (e.g. "2432345676"). For "1", I got
15,550,000,000 hits. For "2432345676", I got 9 hits.

Regards,

