Boost logo

Boost :

Subject: Re: [boost] [GSOC] proposal for Trie
From: Hardy (hzithlony_at_[hidden])
Date: 2013-04-28 04:05:48


On 2013/4/28 0:54, Antony Polukhin wrote:
> 2013/4/27 Hardy <hzithlony_at_[hidden]>:
>> On 2013/4/27 1:20, Antony Polukhin wrote:
>>>
>>> 2013/4/26 Hardy <hzithlony_at_[hidden]>:
>>>>
>>>> On 2013/4/26 18:47, Antony Polukhin wrote:
>>>>>
>>>>>
>>>>> 2013/4/26 Hardy <hzithlony_at_[hidden]>:
>>>>>>
>>>>>>
>>>>>> Firstly, the 'key' here represent a string.
>>>>>> The first 'get-key' function is to get a key which corresponds to the
>>>>>> iterator(node in Trie), because in order to save space(which is one of
>>>>>> Trie's advantages), the whole key string should not store in a single
>>>>>> iterator(node), so it is necessary to provide such an interface under
>>>>>> my
>>>>>> design.
>>>>>
>>>>>
>>>>>
>>>>> This looks bad.
>>>>> Iterating through elements of 'string' may be a useful feature, but
>>>>> not the main.
>>>>> User will be putting in container 'string' and he will be expecting to
>>>>> retrieve 'string' when dereferencing iterator.
>>>>>
>>>>>> Since the relationship between key and string makes you confused, I am
>>>>>> now
>>>>>> trying to explain more about it.
>>>>>> In my current design, a key is a 'string' itself.
>>>>>> My idea is to take sequences of anytype as an abstract string in the
>>>>>> bottom
>>>>>> layer of the lib, in order to extend the capability of Trie.
>>>>>> And after thinking on it several times, I now prefer to use 'key' as an
>>>>>> element of a 'string', for it is more clear to users.
>>>>>> Since it is a crucial and essential problem of the whole Trie project,
>>>>>> I
>>>>>> want some advices of you on it.
>>>>>
>>>>>
>>>>>
>>>>> 'key' is what we are going to store in container. User shall not care
>>>>> that we store key in parts, so it looks right to treat 'key' close to
>>>>> 'string'.
>>>>>
>>>>>> Ok, you mean that I should add this link of our discussion to the
>>>>>> 'additional info' of my proposal in GSoC page?
>>>>>
>>>>>
>>>>>
>>>>> Yes
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Antony Polukhin
>>>>>
>>>>> _______________________________________________
>>>>> Unsubscribe & other changes:
>>>>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>>>>
>>>>
>>>> I agree with you, that is why I design a layer of interfaces as TrieMap,
>>>> and
>>>> TrieSet, in my conceivation, it can be used as stl:map and stl:set, with
>>>> the
>>>> difference that they retrieve the keys using Trie data structure. So of
>>>> course user can get 'key' when dereferencing iterator.
>>>> Meanwhile, I still think that it is useful to provide some classes on
>>>> lower
>>>> layer to not store the keys in the iterators, so that users can have a
>>>> more
>>>> space-saving application.
>>>> In conclusion, I adjust my design to the following form:
>>>>
>>>> bottom layer:
>>>> class Trie<key_t, value_t>;
>>>> The key and value are not in the for as map(pair<key_t, value_t>),
>>>> so it is space-saving.
>>>>
>>>> second layer:
>>>> typedef TrieMap<key_t, value_t> Trie<key_t, pair<key_t, value_t> >
>>>>
>>>> The two layers can satisfy different needs of users.
>>>
>>>
>>> Duplicating data is bad. If we have an iterator pointing to leaf node
>>> we can restore the whole key, without storing it one more time (leaf
>>> node == final element of whole key).
>>>
>>
>> In Trie a non-leaf node can be the ending of a string, so you thought here
>> confused me. How will your design handle this situation? A pointer to a
>> single leaf node from the non-leaf node to it?
>
> How about a node, that has no value and only shows that this is the
> end of substring:
>
> struct node_base {
> node* next_on_same_level_;
> node* deeper_or_parent_; // points to parent if next_on_same_level_ is NULL
> };
>
> template <class Element>
> struct node_nonleaf: public node_base {
> Element value_;
> };
>
> struct node_leaf: public node_base {}; // next_on_same_level_ is always NULL
>
> So sting 'abc' in trie would look like node_nonleaf('a') ->
> node_nonleaf('b') -> node_nonleaf('c') -> node_leaf
> Now if we add string 'ab' our structure will look like
> node_nonleaf('a') -> node_nonleaf('b') -> [node_nonleaf('c'),
> node_leaf] -> node_leaf
>
>
>>> In my head node looks like this:
>>>
>>> template <class Element>
>>> struct node {
>>> node* next_on_same_level_;
>>> node* deeper_or_parent_; // points to parent if next_on_same_level_ is
>>> NULL
>>>
>>> Element value_;
>>> };
>>
>>
>>>
>>> If next_on_same_level_ is NULL - we are in leaf node. To retrieve key
>>> we need to move to root, gathering elements.
>>>
>>> And it looks like it is possible to get rid of one of the pointers
>>> using http://en.wikipedia.org/wiki/XOR_linked_list technique.
>>>
>>
>> That is an amzazing trick, it is good to see such a beautiful method
>> implementing linked list.
>
> There are some more tricks, that allow to hide a few bits of
> information in XORed value. But generic implementation first,
> optimizations later.
>
> --
> Best regards,
> Antony Polukhin
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

Ok, thanks, I will digest your idea, and continue optimizing my design.
BTW, I've submitted a version of my proposal which shows the result of
our discussions till now. And I will refine it carefully when we have
further discussion.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk