Boost logo

Boost :

Subject: Re: [boost] [GSOC] proposal for Trie
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2013-04-26 04:15:16


2013/4/25 Hardy <hzithlony_at_[hidden]>:
> On 2013/4/23 20:59, Antony Polukhin wrote:
>>
>> 2013/4/23 Hardy <hzithlony_at_[hidden]>:
>>>
>>> "A good addition to proposal will be some sketch of trie interface:
>>> methods and constructors that Trie class will contain."
>>> Could you give me some examples or more advices on that?
>>> What I think out now are something like STL set, map, and with some
>>> specific
>>> methods and variables of Trie.
>>
>>
>> You are right, it is better to be as close to STL interface as
>> possible. You may also say something like:
>>
>> "
>> Trie interface will have all the typedefs and methods of std::set
>> excluding:
>> iterator insert (const_iterator position, const value_type& val);
>> ...
>> It will additionally have the following methods:
>>
>> pair<const_iterator,const_iterator> prefix_range (const value_type&
>> prefix) const; - returns range of elements starting from `prefix`.
>> ...
>> "
>> You may also add some notes that you think will be interesting or just
>> affect design, like "trie implementation will be storing it's size as
>> a separate field to allow getting size in constant time" or "won't be
>> storing it's size leading to liner complexity of size() function"...
>> Some more examples of containers features:
>>
>> http://www.boost.org/doc/libs/1_53_0/doc/html/container/other_features.html
>>
>> --
>> Best regards,
>> Antony Polukhin
>>
>> _______________________________________________
>> Unsubscribe & other changes:
>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>
>
> Hi, the following is what I conceive of Trie, which I will put to the
> 'description of work' in my proposal, because of its unmaturity, I will be
> happy to discuss on it with you, please give me some advice on it so that I
> can finish it and submit my application on the GSOC site.
> Thank you very much.
>
> ==description==
> Trie(Radix Tree) is an efficient in memory data structure to deal with
> string storing and searching. It can be used in many applications, and in
> comparison to hashtables, it has better expandability and stability. It is a
> data structure of great importance and wide usage.
> C++ has already had powerful common key-value containers such as set and
> map(I will only use map to make the comparison later), but when the keys are
> strings, map will take O(logN * length(string)) to lookup a key which has a
> worse performance when handling long strings and large amount of data, and
> stores every key as a full string which also wastes space. Besides, it is
> hard to use map traverse some keys with common prefix.
> Trie is good at handling strings especially the above tasks, so it is needed
> to improve the performance of handling key-value storage with the type of
> key being string.
>
>
> The basic interface of Trie will have similar typedefs and methods to
> stl::map, with some differences:
>
> pair<const_iterator,const_iterator> prefix_range (const key_type&
> prefix) const; - returns range of elements starting from ‘prefix’
> void prefix_erase(const key_type&
> prefix) const; - erases elements with keys starting with ‘prefix’
> void prefix_count(const key_type&
> prefix) const; - returns the number of elements with keys starting with
> ‘prefix’

Pretty good!

> key_type get_key(const_iterator position);
> - returns the key of the iterator on position, because I think it is a
> waste of space to store the complete key, so users cannot get the key
> directly by iterator.
> vector<key_type> get_keys(const Trie &trie);
> - returns the keys of all elements in Trie
> vector<key_type> get_keys(const key_type& prefix);
> - returns the keys of all elements in starting with ‘prefix’
> vector<key_type> get_keys(const_iterator left, const_iterator right);
> - returns the keys between iterator left and right.

Did not get the idea. Dereferencing an iterator what shall return? Do
you mean 'key' here as a character of substring or 'key' here is a
string itself?

> Besides, considering that traditional Trie has limited scale of using, I
> want to design my Trie in 3 layers:
> The bottom one can provide the user-definition key type, which uses like
> RawTrie<key_t, value_t>, in which the key can be a sequence of some
> type(maybe an array of anything can be a key), and every node of Trie can be
> taken as a root of sub RawTrie, which gives convenience to add Trie into it
> and even swap a sub Trie.
> And the second layer will be a specific Trie with the key of chars(when
> handling lookup and insert, both string and char[] can be adapted to it).
> And the feature of sub Trie is provided. The class is like Trie<value_t>.
> And the third layer will be a Trie without so much freedom of operating as
> the above two. I will just provide the basic interface as I list above.
> And the last two layers of Trie will have a version of compressed storage of
> strings.

Relations between layers are not clear.

Bay the way, student applications can now be submitted to
<https://google-melange.appspot.com/gsoc/homepage/google/gsoc2013>.
The deadline is 3 May.
Do not forget to submit your proposal and add link to this discussion
in proposal.

--
Best regards,
Antony Polukhin

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk