Boost logo

Boost :

From: Andrzej Krzemienski (akrzemi1_at_[hidden])
Date: 2020-09-25 22:52:50


pt., 25 wrz 2020 o 16:48 Vinnie Falco <vinnie.falco_at_[hidden]> napisał(a):

> On Fri, Sep 25, 2020 at 7:07 AM Andrzej Krzemienski via Boost
> <boost_at_[hidden]> wrote:
> > Are JSON numbers only good for storing int-based identifiers?
>
> The JSON specification is silent on the limits and precision of the
> range of numbers. All that we know is that it is a "light-weight data
> interchange format." However, we can gather quite a bit of anecdotal
> evidence simply by looking at the various languages which have
> built-in support for JSON.
>
> From RFC7159 (https://tools.ietf.org/html/rfc7159)
>
> This specification allows implementations to set limits on the range
> and precision of numbers accepted. Since software that implements
> IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
> generally available and widely used, good interoperability can be
> achieved by implementations that expect no more precision or range
> than these provide, in the sense that implementations will
> approximate JSON numbers within the expected precision. A JSON
> number such as 1E400 or 3.141592653589793238462643383279 may indicate
> potential interoperability problems, since it suggests that the
> software that created it expects receiving software to have greater
> capabilities for numeric magnitude and precision than is widely
> available.
>
> Note the phrase "widely available."
>
> From <
> https://stackoverflow.com/questions/13502398/json-integers-limit-on-size>
>
> As a practical matter, Javascript integers are limited to about 2^53
> (there are no integers; just IEEE floats).
>
> From <https://developers.google.com/discovery/v1/type-format>
>
> ...a 64-bit integer cannot be represented in JSON (since JavaScript
> and JSON support integers up to 2^53).
>
> From <https://github.com/josdejong/lossless-json>
>
> When to use? Only in some special cases. For example when you
> have to create some sort of data processing middleware which has
> to process arbitrary JSON without risk of screwing up. JSON objects
> containing big numbers are rare in the wild.
>
> From <
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number
> >
>
> The JavaScript Number type is a double-precision 64-bit binary format
> IEEE 754 value, like double in Java or C#....When parsing data that has
> been serialized to JSON, integer values falling outside of this range
> can
> be expected to become corrupted when JSON parser coerces them to
> Number type. A possible workaround is to use String instead.
>
> From <
> https://docs.python.org/3/library/json.html#implementation-limitations>
>
> When serializing to JSON, beware any such limitations in applications
> that may consume your JSON. In particular, it is common for JSON
> numbers to be deserialized into IEEE 754 double precision numbers
> and thus subject to that representation’s range and precision
> limitations.
>
> I am actually now starting to wonder if even 64-bit integer support
> was a good idea, as it can produce numbers which most implementations
> cannot read with perfect fidelity.
>
> It is true that there are some JSON implementations which support
> arbitrary-precision numbers, but these are rare and all come with the
> caveat that their output will likely be incorrectly parsed or rejected
> by the majority of implementations. This is quite an undesirable
> feature for an "interoperable, data-exchange format" or a vocabulary
> type. Support for arbitrary precision numbers would not come without
> cost. The library would be bigger, in a way that the linker can't
> strip (because of switch statements on the variant's kind). Everyone
> would pay for this feature (e.g. embedded) but only a handful of users
> would use it.
>
> There is overwhelming evidence that the following statement is false:
>
> "json::value *needs* to support arbitrary numbers. It's incomplete
> without it."
>

I accidentally replied privately to Vinnie. I am now pasting my reply here:

Thanks. This is a really useful background. This explains why JSON format
> conflates integer and floating point numbers: in fact, originally this was
> only floating point numbers. Number 1 is just a different representation of
> a floating-point number. But if we adapt this view, bearing in mind that
> JavaScript JSON libraries may not be able to parse big uint64_t values,
> indeed Boost.JSON might have made the wrong trade-off by adding support for
> the full range of uint64_t. The cost is: (1) some values generated by
> Boost.JSON cannot be parsed by JavaScript JSON libraries, and (2) the
> complication of the interface (number_cast). And one could say that big
> uint64_t values constitute the 1% of the use cases that are not worth the
> costs.
> On the other hand there is one quire natural use case for the full range
> of uint64_t: hash values: they are naturally stored as size_t and the
> biggest values are equally likely to appear as the smallest. And libraries
> like rapidjson handle this case, so when they are able to serialize it,
> Boost.JSN should be able to parse it. It looks like the two following goals
> are not compatible:
> 1. Parse losslessly every value produced by rapidjson.
> 2. Generate only values parsable by losslessly JavaScript JSON libraries.
>
> So, I guess the choice made in Boost.JSON is the good one. You will
> potentially produce values not parsable by some JSON libraries, and if goal
> 2 is important for some use cases the user has to make sure that she is
> only putting doubles as numbers.
>
> By the way, when I learned about these issues with numbers/doubles, it
> occured to me that Boost.JSON must have somewhere a flaw in handling
> numbers given that it stores three different types and provides equality
> operator. So I tried to break it. And I couldn't. The mechanism for storing
> int, uint and double is very well designed and thought over: that you
> always prefer ints to doubles when parsing, that you always add a comma or
> exponent where serializing floats, that you compare correctly ins with
> uints, and that you always compare ints and floats as unequal. This is
> really consistent. I think it deserves a mention in the documentation.
>

Regards,
&rzej;


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk