Boost logo

Boost :

From: Joana Modesto Hrotko (joana.hrotko_at_[hidden])
Date: 2020-02-09 21:53:41


A 2020-02-09 20:09, Mateusz Loskot via Boost escreveu:
> On Sun, 9 Feb 2020 at 14:18, degski via Boost <boost_at_[hidden]>
> wrote:
>> On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost
>> <boost_at_[hidden]> wrote:
>>
>> > Dear Boost community,
>> >
>> > I am a Master's student and currently working on my Master thesis.
>> > My Master Thesis consists in implementing the k2-tree data structure in
>> > C++ and one of the additional proposals for my thesis would be to
>> > integrate my code with the boost graph library.I believe this
>> > integration
>> > would bring great value to the boost library and also it would provide a
>> > better validation for my work.
>> > So my question is if this would be something of interest for the library
>> > to integrate with boost graph library.
>> >
>> > INFO ABOUT K2-TREE:
>> > The k2-tree have been proved successful to represent in a very compact
>> > way different kinds of binary relations, such as web graphs, RDFs or
>> > raster data.
>> > Binary relations can be an abstraction to represent the relation between
>> > the objects of two collections of different nature: graphs, trees,
>> > strings, among others. They can be used in several low-level structures
>> > within a more complex information retrieval system, or even replace one
>> > of the most used one: an inverted index can be regarded as a binary
>> > relation
>> > between the vocabulary of terms and the documents where they appear.
>> > Moreover,
>> > it can represent the relation between terms and web pages, or even
>> > social
>> > networks or the connection between the web pages in the Web, which is
>> > normally
>> > represented as a web graph. k2-trees have been also used in other
>> > scenarios,
>> > such as geographical and RDF data, or images.
>> >
>> > Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary
>> > relations with applications." Information Systems 69 (2017): 106-123.
>> > Coimbra, Miguel E., et al. "On dynamic succinct graph representations."
>> > arXiv preprint arXiv:1911.03195 (2019).
>> >
>>
>> Great subject for a thesis. Munro, the man of the beap
>> <https://github.com/pfalcon/beap> (a dynamic implicit data structure
>> <https://github.com/pfalcon/awesome-implicit-data-structures>). I
>> don't
>> really understand why you would like a dynamic k2-tree, you can just
>> as
>> well have a static one
>> <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp>
>
> By the way, I think k2-tree which is of Joana interest may be one of
> the special cases
> of the k-ary https://github.com/boostorg/graph/pull/139
>
> Best regards,

Hi

first of all thank you so much for the reply of all of you.

So the k2-tree is a succint data structure and it is equivalent to a web
graph,
so conceptually is somehow similar to the k-ary tree however,
implementation wise it is not
the same.

In the k2-tree it is meant to represent the relation between nodes in a
graph, based
on the matrix representation of the graph. In order to compress the
information in
the matrix (hopefully sparse), the method consists in subdividing the
binary matrix
into k rows and k columns, thus producing k2 submatrices of size n/k by
n/k. Each of
these submatriceswill be a child of the root node and its value will be
1 iff there
is at least one 1 in the cells of the submatrix. A 0 child means that
the submatrix
has all 0s and hence the tree decomposition ends there. Once the level
1, with the
children of the root, has been built, the method proceeds recursively
for each child
with value 1, until we reach submatrices full of 0s, or we reach the
cells of the
original matrix.
The k2-ary tree described is represented in a compact way with just two
bit
arrays: T (tree) and L(leaves). T stores all the bits of the k2-tree
except
those in the last level following a level-wise traversal. Bitmap L
stores the last
level of the tree.

 From the pull requests it seems there is no similar data structure to
the k2-tree
being implemented right now. However how hard is it to be able to
complete the pull request?
And how is the decision taken?

Best regards,
Joana


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk