Boost logo

Boost :

Subject: Re: [boost] [proposed][histogram]
From: Bjorn Reese (breese_at_[hidden])
Date: 2017-04-12 10:34:09

On 04/12/2017 11:37 AM, Hans Dembinski via Boost wrote:

> The library implements a histogram class (a highly configurable policy-based template) for C++ and Python in C++11 code. Histograms are a standard tool to explore Big Data. They allow one to visualise and analyse distributions of random variables. A histogram provides a lossy compression of input data. GBytes of input can be put in a compact form which requires only a small fraction of the original memory. This makes histograms convenient for interactive data analysis and further processing.

Given that the compression is lossy, I am wondering how it compares with
a distribution estimator like:

A common use-case when collecting numerical data is to determine the
quantiles. Boost.Accumulators contains an estimator (extended_p_square)
for that.

The advantage of such estimators are that they execute in constant time
and with constant memory usage, where the constant depends only on the
required precision.

PS: I am aware that this is a non-trivial question, so I do not expect
     an answer.

Boost list run by bdawes at, gregod at, cpdaniel at, john at