Boost logo

Boost :

Subject: Re: [boost] [GSOC] proposal for Trie
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2013-04-26 13:20:47


2013/4/26 Hardy <hzithlony_at_[hidden]>:
> On 2013/4/26 18:47, Antony Polukhin wrote:
>>
>> 2013/4/26 Hardy <hzithlony_at_[hidden]>:
>>>
>>> Firstly, the 'key' here represent a string.
>>> The first 'get-key' function is to get a key which corresponds to the
>>> iterator(node in Trie), because in order to save space(which is one of
>>> Trie's advantages), the whole key string should not store in a single
>>> iterator(node), so it is necessary to provide such an interface under my
>>> design.
>>
>>
>> This looks bad.
>> Iterating through elements of 'string' may be a useful feature, but
>> not the main.
>> User will be putting in container 'string' and he will be expecting to
>> retrieve 'string' when dereferencing iterator.
>>
>>> Since the relationship between key and string makes you confused, I am
>>> now
>>> trying to explain more about it.
>>> In my current design, a key is a 'string' itself.
>>> My idea is to take sequences of anytype as an abstract string in the
>>> bottom
>>> layer of the lib, in order to extend the capability of Trie.
>>> And after thinking on it several times, I now prefer to use 'key' as an
>>> element of a 'string', for it is more clear to users.
>>> Since it is a crucial and essential problem of the whole Trie project, I
>>> want some advices of you on it.
>>
>>
>> 'key' is what we are going to store in container. User shall not care
>> that we store key in parts, so it looks right to treat 'key' close to
>> 'string'.
>>
>>> Ok, you mean that I should add this link of our discussion to the
>>> 'additional info' of my proposal in GSoC page?
>>
>>
>> Yes
>>
>> --
>> Best regards,
>> Antony Polukhin
>>
>> _______________________________________________
>> Unsubscribe & other changes:
>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>
>
> I agree with you, that is why I design a layer of interfaces as TrieMap, and
> TrieSet, in my conceivation, it can be used as stl:map and stl:set, with the
> difference that they retrieve the keys using Trie data structure. So of
> course user can get 'key' when dereferencing iterator.
> Meanwhile, I still think that it is useful to provide some classes on lower
> layer to not store the keys in the iterators, so that users can have a more
> space-saving application.
> In conclusion, I adjust my design to the following form:
>
> bottom layer:
> class Trie<key_t, value_t>;
> The key and value are not in the for as map(pair<key_t, value_t>),
> so it is space-saving.
>
> second layer:
> typedef TrieMap<key_t, value_t> Trie<key_t, pair<key_t, value_t> >
>
> The two layers can satisfy different needs of users.

Duplicating data is bad. If we have an iterator pointing to leaf node
we can restore the whole key, without storing it one more time (leaf
node == final element of whole key).

In my head node looks like this:

template <class Element>
struct node {
node* next_on_same_level_;
node* deeper_or_parent_; // points to parent if next_on_same_level_ is NULL

Element value_;
};

If next_on_same_level_ is NULL - we are in leaf node. To retrieve key
we need to move to root, gathering elements.

And it looks like it is possible to get rid of one of the pointers
using http://en.wikipedia.org/wiki/XOR_linked_list technique.

Or am I missing something?

--
Best regards,
Antony Polukhin

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk