|
Boost : |
Subject: Re: [boost] [proposed][histogram]
From: Bjorn Reese (breese_at_[hidden])
Date: 2017-04-12 10:34:09
On 04/12/2017 11:37 AM, Hans Dembinski via Boost wrote:
> The library implements a histogram class (a highly configurable policy-based template) for C++ and Python in C++11 code. Histograms are a standard tool to explore Big Data. They allow one to visualise and analyse distributions of random variables. A histogram provides a lossy compression of input data. GBytes of input can be put in a compact form which requires only a small fraction of the original memory. This makes histograms convenient for interactive data analysis and further processing.
Given that the compression is lossy, I am wondering how it compares with
a distribution estimator like:
https://arxiv.org/abs/1507.05073v2
A common use-case when collecting numerical data is to determine the
quantiles. Boost.Accumulators contains an estimator (extended_p_square)
for that.
The advantage of such estimators are that they execute in constant time
and with constant memory usage, where the constant depends only on the
required precision.
PS: I am aware that this is a non-trivial question, so I do not expect
an answer.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk