Boost logo

Boost :

Subject: Re: [boost] [GSOC] proposal for Trie
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2013-04-27 12:54:53


2013/4/27 Hardy <hzithlony_at_[hidden]>:
> On 2013/4/27 1:20, Antony Polukhin wrote:
>>
>> 2013/4/26 Hardy <hzithlony_at_[hidden]>:
>>>
>>> On 2013/4/26 18:47, Antony Polukhin wrote:
>>>>
>>>>
>>>> 2013/4/26 Hardy <hzithlony_at_[hidden]>:
>>>>>
>>>>>
>>>>> Firstly, the 'key' here represent a string.
>>>>> The first 'get-key' function is to get a key which corresponds to the
>>>>> iterator(node in Trie), because in order to save space(which is one of
>>>>> Trie's advantages), the whole key string should not store in a single
>>>>> iterator(node), so it is necessary to provide such an interface under
>>>>> my
>>>>> design.
>>>>
>>>>
>>>>
>>>> This looks bad.
>>>> Iterating through elements of 'string' may be a useful feature, but
>>>> not the main.
>>>> User will be putting in container 'string' and he will be expecting to
>>>> retrieve 'string' when dereferencing iterator.
>>>>
>>>>> Since the relationship between key and string makes you confused, I am
>>>>> now
>>>>> trying to explain more about it.
>>>>> In my current design, a key is a 'string' itself.
>>>>> My idea is to take sequences of anytype as an abstract string in the
>>>>> bottom
>>>>> layer of the lib, in order to extend the capability of Trie.
>>>>> And after thinking on it several times, I now prefer to use 'key' as an
>>>>> element of a 'string', for it is more clear to users.
>>>>> Since it is a crucial and essential problem of the whole Trie project,
>>>>> I
>>>>> want some advices of you on it.
>>>>
>>>>
>>>>
>>>> 'key' is what we are going to store in container. User shall not care
>>>> that we store key in parts, so it looks right to treat 'key' close to
>>>> 'string'.
>>>>
>>>>> Ok, you mean that I should add this link of our discussion to the
>>>>> 'additional info' of my proposal in GSoC page?
>>>>
>>>>
>>>>
>>>> Yes
>>>>
>>>> --
>>>> Best regards,
>>>> Antony Polukhin
>>>>
>>>> _______________________________________________
>>>> Unsubscribe & other changes:
>>>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>>>
>>>
>>> I agree with you, that is why I design a layer of interfaces as TrieMap,
>>> and
>>> TrieSet, in my conceivation, it can be used as stl:map and stl:set, with
>>> the
>>> difference that they retrieve the keys using Trie data structure. So of
>>> course user can get 'key' when dereferencing iterator.
>>> Meanwhile, I still think that it is useful to provide some classes on
>>> lower
>>> layer to not store the keys in the iterators, so that users can have a
>>> more
>>> space-saving application.
>>> In conclusion, I adjust my design to the following form:
>>>
>>> bottom layer:
>>> class Trie<key_t, value_t>;
>>> The key and value are not in the for as map(pair<key_t, value_t>),
>>> so it is space-saving.
>>>
>>> second layer:
>>> typedef TrieMap<key_t, value_t> Trie<key_t, pair<key_t, value_t> >
>>>
>>> The two layers can satisfy different needs of users.
>>
>>
>> Duplicating data is bad. If we have an iterator pointing to leaf node
>> we can restore the whole key, without storing it one more time (leaf
>> node == final element of whole key).
>>
>
> In Trie a non-leaf node can be the ending of a string, so you thought here
> confused me. How will your design handle this situation? A pointer to a
> single leaf node from the non-leaf node to it?

How about a node, that has no value and only shows that this is the
end of substring:

struct node_base {
node* next_on_same_level_;
node* deeper_or_parent_; // points to parent if next_on_same_level_ is NULL
};

template <class Element>
struct node_nonleaf: public node_base {
Element value_;
};

struct node_leaf: public node_base {}; // next_on_same_level_ is always NULL

So sting 'abc' in trie would look like node_nonleaf('a') ->
node_nonleaf('b') -> node_nonleaf('c') -> node_leaf
Now if we add string 'ab' our structure will look like
node_nonleaf('a') -> node_nonleaf('b') -> [node_nonleaf('c'),
node_leaf] -> node_leaf

>> In my head node looks like this:
>>
>> template <class Element>
>> struct node {
>> node* next_on_same_level_;
>> node* deeper_or_parent_; // points to parent if next_on_same_level_ is
>> NULL
>>
>> Element value_;
>> };
>
>
>>
>> If next_on_same_level_ is NULL - we are in leaf node. To retrieve key
>> we need to move to root, gathering elements.
>>
>> And it looks like it is possible to get rid of one of the pointers
>> using http://en.wikipedia.org/wiki/XOR_linked_list technique.
>>
>
> That is an amzazing trick, it is good to see such a beautiful method
> implementing linked list.

There are some more tricks, that allow to hide a few bits of
information in XORed value. But generic implementation first,
optimizations later.

--
Best regards,
Antony Polukhin

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk