Boost logo

Boost :

Subject: Re: [boost] [GSOC] proposal for Trie
From: Hardy (hzithlony_at_[hidden])
Date: 2013-04-26 05:48:49


On 2013/4/26 16:15, Antony Polukhin wrote:
> 2013/4/25 Hardy <hzithlony_at_[hidden]>:
>> On 2013/4/23 20:59, Antony Polukhin wrote:
>>>
>>> 2013/4/23 Hardy <hzithlony_at_[hidden]>:
>>>>
>>>> "A good addition to proposal will be some sketch of trie interface:
>>>> methods and constructors that Trie class will contain."
>>>> Could you give me some examples or more advices on that?
>>>> What I think out now are something like STL set, map, and with some
>>>> specific
>>>> methods and variables of Trie.
>>>
>>>
>>> You are right, it is better to be as close to STL interface as
>>> possible. You may also say something like:
>>>
>>> "
>>> Trie interface will have all the typedefs and methods of std::set
>>> excluding:
>>> iterator insert (const_iterator position, const value_type& val);
>>> ...
>>> It will additionally have the following methods:
>>>
>>> pair<const_iterator,const_iterator> prefix_range (const value_type&
>>> prefix) const; - returns range of elements starting from `prefix`.
>>> ...
>>> "
>>> You may also add some notes that you think will be interesting or just
>>> affect design, like "trie implementation will be storing it's size as
>>> a separate field to allow getting size in constant time" or "won't be
>>> storing it's size leading to liner complexity of size() function"...
>>> Some more examples of containers features:
>>>
>>> http://www.boost.org/doc/libs/1_53_0/doc/html/container/other_features.html
>>>
>>> --
>>> Best regards,
>>> Antony Polukhin
>>>
>>> _______________________________________________
>>> Unsubscribe & other changes:
>>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>>
>>
>> Hi, the following is what I conceive of Trie, which I will put to the
>> 'description of work' in my proposal, because of its unmaturity, I will be
>> happy to discuss on it with you, please give me some advice on it so that I
>> can finish it and submit my application on the GSOC site.
>> Thank you very much.
>>
>> ==description==
>> Trie(Radix Tree) is an efficient in memory data structure to deal with
>> string storing and searching. It can be used in many applications, and in
>> comparison to hashtables, it has better expandability and stability. It is a
>> data structure of great importance and wide usage.
>> C++ has already had powerful common key-value containers such as set and
>> map(I will only use map to make the comparison later), but when the keys are
>> strings, map will take O(logN * length(string)) to lookup a key which has a
>> worse performance when handling long strings and large amount of data, and
>> stores every key as a full string which also wastes space. Besides, it is
>> hard to use map traverse some keys with common prefix.
>> Trie is good at handling strings especially the above tasks, so it is needed
>> to improve the performance of handling key-value storage with the type of
>> key being string.
>>
>>
>> The basic interface of Trie will have similar typedefs and methods to
>> stl::map, with some differences:
>>
>> pair<const_iterator,const_iterator> prefix_range (const key_type&
>> prefix) const; - returns range of elements starting from ‘prefix’
>> void prefix_erase(const key_type&
>> prefix) const; - erases elements with keys starting with ‘prefix’
>> void prefix_count(const key_type&
>> prefix) const; - returns the number of elements with keys starting with
>> ‘prefix’
>
> Pretty good!
>
>> key_type get_key(const_iterator position);
>> - returns the key of the iterator on position, because I think it is a
>> waste of space to store the complete key, so users cannot get the key
>> directly by iterator.
>> vector<key_type> get_keys(const Trie &trie);
>> - returns the keys of all elements in Trie
>> vector<key_type> get_keys(const key_type& prefix);
>> - returns the keys of all elements in starting with ‘prefix’
>> vector<key_type> get_keys(const_iterator left, const_iterator right);
>> - returns the keys between iterator left and right.
>
> Did not get the idea. Dereferencing an iterator what shall return? Do
> you mean 'key' here as a character of substring or 'key' here is a
> string itself?

Sorry for not stating my design clearly.
Firstly, the 'key' here represent a string.
The first 'get-key' function is to get a key which corresponds to the
iterator(node in Trie), because in order to save space(which is one of
Trie's advantages), the whole key string should not store in a single
iterator(node), so it is necessary to provide such an interface under my
design.
So I think it is not difficult to understand the last three function
prototypes here is to show all the keys in different ranges.

Since the relationship between key and string makes you confused, I am
now trying to explain more about it.
In my current design, a key is a 'string' itself.
My idea is to take sequences of anytype as an abstract string in the
bottom layer of the lib, in order to extend the capability of Trie.
And after thinking on it several times, I now prefer to use 'key' as an
element of a 'string', for it is more clear to users.
Since it is a crucial and essential problem of the whole Trie project, I
want some advices of you on it.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk