From: Jeff Hammond (jeff.science_at_[hidden])
Date: 2019-10-15 14:55:35
BFloat16 conversion to Float32 is not 100% trivial, because of NaN and
rounding modes. I think
a good job at documenting this (the linked code is Apache 2.0 licensed, in
case you worry about such issues). However, if one ignores these
complexities, conversion is as simple as bitshifting.
The public Intel BF16 spec is
which describes the details of the Intel definition. There are some
implementation specific details in https://reviews.llvm.org/D60550. I
can't comment on other hardware implementations.
The important distinction between __fp16 and _Float16 is the former is a
storage format type, not an arithmetic type, whereas the latter is an
arithmetic type. The former is more easily implementable, e.g.Intel CPUs
since Ivy Bridge use the F16C to do fast conversion to/from the 16-bit
storage format to float32, but all arithmetic is done with float32 hardware.
Folks who are interested in this topic may enjoy reading
https://arxiv.org/abs/1904.06376. The methods described therein are not
necessarily applicable to Boost Multiprecision, but may be relevant if
uBLAS gets involved.
Jeff, who works for Intel
On Tue, Oct 15, 2019 at 7:04 AM Phil Endecott via Boost <
> Matt Hurd wrote:
> > IEEE 16bit (fp16) and bfloat16 are both around, but bfloat16 seems to be
> > the new leader in modern implementations thanks to ML use. I haven't
> > experienced both used together but I wouldn't rule it out given bfloat16
> > may be accelerator specific. Google and intel have support for bfloat16
> > some hardware. bfloat16 makes it easy to move to fp32 as they have the
> > exponent size.
> > Refs: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
> According to section 4.1.2 of this ARM document:
> implementations support both the IEEE format (1 sign, 5 exponent and 10
> mantissa) and an alternative format which is similar except that it doesn't
> support Inf and NaN, and gains slightly more range. Apparently the
> format is supported in ARMv8.6-A, but I don't believe that is deployed
> The other place where I've used 16-bit floats is in OpenGL textures,
> which use the 1-5-10 format.
> I was a bit surprised by the 1-5-10 choice; the maximum value that can
> be represented is only 65504, i.e. less than the maximum value for an
> unsigned int of the same size.
> bfloat16 can be trivially implemented (as a storage-only type) simply
> by truncating a 32-bit float; perhaps support for that would be useful
> Regards, Phil.
> Unsubscribe & other changes:
-- Jeff Hammond jeff.science_at_[hidden] http://jeffhammond.github.io/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk