but documentation claims they offer same results as non fast ones. Is this documentation issue or you think documentation is correct? I would say that the fact that value X in non fast can be divided in and produce subnormal result(greater than 0) is quite different from getting rounding to 0 for fast type.
I can add a blurb to the effect of "within it's domain". Subnormals aren't particularly useful so I would not argue it's markedly different mathematical support. There are plenty of compilers, optimizers, hardware platforms etc. that will flush your binary floating point subnormals to 0.
Another question about digits count, could this be implemented without the checks on estimated_digits, i.e. by making the array larger or some trick like that to avoid checking estimated_digits < 10 and estimated_digits > 1? I do not claim I benchmarked this, but I know people usually try to replace branches in code like this with "clever" array access.
I have benchmarked a number of different methods inside the library. In a Lemire blog post on counting digits he found that sometimes more instructions does not equal worse runtime [1]. Yes, I have tried his methods. Matt [1] https://lemire.me/blog/2025/01/07/counting-the-digits-of-64-bit-integers/