Boost logo

Boost :

Subject: Re: [boost] [GSOC] proposal for Trie
From: Hardy (hzithlony_at_[hidden])
Date: 2013-04-27 05:45:01


On 2013/4/27 1:20, Antony Polukhin wrote:
> 2013/4/26 Hardy <hzithlony_at_[hidden]>:
>> On 2013/4/26 18:47, Antony Polukhin wrote:
>>>
>>> 2013/4/26 Hardy <hzithlony_at_[hidden]>:
>>>>
>>>> Firstly, the 'key' here represent a string.
>>>> The first 'get-key' function is to get a key which corresponds to the
>>>> iterator(node in Trie), because in order to save space(which is one of
>>>> Trie's advantages), the whole key string should not store in a single
>>>> iterator(node), so it is necessary to provide such an interface under my
>>>> design.
>>>
>>>
>>> This looks bad.
>>> Iterating through elements of 'string' may be a useful feature, but
>>> not the main.
>>> User will be putting in container 'string' and he will be expecting to
>>> retrieve 'string' when dereferencing iterator.
>>>
>>>> Since the relationship between key and string makes you confused, I am
>>>> now
>>>> trying to explain more about it.
>>>> In my current design, a key is a 'string' itself.
>>>> My idea is to take sequences of anytype as an abstract string in the
>>>> bottom
>>>> layer of the lib, in order to extend the capability of Trie.
>>>> And after thinking on it several times, I now prefer to use 'key' as an
>>>> element of a 'string', for it is more clear to users.
>>>> Since it is a crucial and essential problem of the whole Trie project, I
>>>> want some advices of you on it.
>>>
>>>
>>> 'key' is what we are going to store in container. User shall not care
>>> that we store key in parts, so it looks right to treat 'key' close to
>>> 'string'.
>>>
>>>> Ok, you mean that I should add this link of our discussion to the
>>>> 'additional info' of my proposal in GSoC page?
>>>
>>>
>>> Yes
>>>
>>> --
>>> Best regards,
>>> Antony Polukhin
>>>
>>> _______________________________________________
>>> Unsubscribe & other changes:
>>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>>
>>
>> I agree with you, that is why I design a layer of interfaces as TrieMap, and
>> TrieSet, in my conceivation, it can be used as stl:map and stl:set, with the
>> difference that they retrieve the keys using Trie data structure. So of
>> course user can get 'key' when dereferencing iterator.
>> Meanwhile, I still think that it is useful to provide some classes on lower
>> layer to not store the keys in the iterators, so that users can have a more
>> space-saving application.
>> In conclusion, I adjust my design to the following form:
>>
>> bottom layer:
>> class Trie<key_t, value_t>;
>> The key and value are not in the for as map(pair<key_t, value_t>),
>> so it is space-saving.
>>
>> second layer:
>> typedef TrieMap<key_t, value_t> Trie<key_t, pair<key_t, value_t> >
>>
>> The two layers can satisfy different needs of users.
>
> Duplicating data is bad. If we have an iterator pointing to leaf node
> we can restore the whole key, without storing it one more time (leaf
> node == final element of whole key).
>

In Trie a non-leaf node can be the ending of a string, so you thought
here confused me. How will your design handle this situation? A pointer
to a single leaf node from the non-leaf node to it?

> In my head node looks like this:
>
> template <class Element>
> struct node {
> node* next_on_same_level_;
> node* deeper_or_parent_; // points to parent if next_on_same_level_ is NULL
>
> Element value_;
> };

>
> If next_on_same_level_ is NULL - we are in leaf node. To retrieve key
> we need to move to root, gathering elements.
>
> And it looks like it is possible to get rid of one of the pointers
> using http://en.wikipedia.org/wiki/XOR_linked_list technique.
>

That is an amzazing trick, it is good to see such a beautiful method
implementing linked list.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk