Boost logo

Boost :

Subject: Re: [boost] [GSOC] proposal for Trie
From: Hardy (hzithlony_at_[hidden])
Date: 2013-04-25 11:15:51


On 2013/4/23 20:59, Antony Polukhin wrote:
> 2013/4/23 Hardy <hzithlony_at_[hidden]>:
>> "A good addition to proposal will be some sketch of trie interface:
>> methods and constructors that Trie class will contain."
>> Could you give me some examples or more advices on that?
>> What I think out now are something like STL set, map, and with some specific
>> methods and variables of Trie.
>
> You are right, it is better to be as close to STL interface as
> possible. You may also say something like:
>
> "
> Trie interface will have all the typedefs and methods of std::set excluding:
> iterator insert (const_iterator position, const value_type& val);
> ...
> It will additionally have the following methods:
>
> pair<const_iterator,const_iterator> prefix_range (const value_type&
> prefix) const; - returns range of elements starting from `prefix`.
> ...
> "
> You may also add some notes that you think will be interesting or just
> affect design, like "trie implementation will be storing it's size as
> a separate field to allow getting size in constant time" or "won't be
> storing it's size leading to liner complexity of size() function"...
> Some more examples of containers features:
> http://www.boost.org/doc/libs/1_53_0/doc/html/container/other_features.html
>
> --
> Best regards,
> Antony Polukhin
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

Hi, the following is what I conceive of Trie, which I will put to the
'description of work' in my proposal, because of its unmaturity, I will
be happy to discuss on it with you, please give me some advice on it so
that I can finish it and submit my application on the GSOC site.
Thank you very much.

==description==
Trie(Radix Tree) is an efficient in memory data structure to deal with
string storing and searching. It can be used in many applications, and
in comparison to hashtables, it has better expandability and stability.
It is a data structure of great importance and wide usage.
C++ has already had powerful common key-value containers such as set and
map(I will only use map to make the comparison later), but when the keys
are strings, map will take O(logN * length(string)) to lookup a key
which has a worse performance when handling long strings and large
amount of data, and stores every key as a full string which also wastes
space. Besides, it is hard to use map traverse some keys with common
prefix.
Trie is good at handling strings especially the above tasks, so it is
needed to improve the performance of handling key-value storage with the
type of key being string.

The basic interface of Trie will have similar typedefs and methods to
stl::map, with some differences:

pair<const_iterator,const_iterator> prefix_range (const key_type&
prefix) const; - returns range of elements starting from ‘prefix’
void prefix_erase(const key_type&
prefix) const; - erases elements with keys starting with ‘prefix’
void prefix_count(const key_type&
prefix) const; - returns the number of elements with keys starting with
‘prefix’

key_type get_key(const_iterator position);
- returns the key of the iterator on position, because I think it is a
waste of space to store the complete key, so users cannot get the key
directly by iterator.
vector<key_type> get_keys(const Trie &trie);
- returns the keys of all elements in Trie
vector<key_type> get_keys(const key_type& prefix);
- returns the keys of all elements in starting with ‘prefix’
vector<key_type> get_keys(const_iterator left, const_iterator right);
- returns the keys between iterator left and right.

Besides, considering that traditional Trie has limited scale of using, I
want to design my Trie in 3 layers:
The bottom one can provide the user-definition key type, which uses like
RawTrie<key_t, value_t>, in which the key can be a sequence of some
type(maybe an array of anything can be a key), and every node of Trie
can be taken as a root of sub RawTrie, which gives convenience to add
Trie into it and even swap a sub Trie.
And the second layer will be a specific Trie with the key of chars(when
handling lookup and insert, both string and char[] can be adapted to
it). And the feature of sub Trie is provided. The class is like
Trie<value_t>.
And the third layer will be a Trie without so much freedom of operating
as the above two. I will just provide the basic interface as I list above.
And the last two layers of Trie will have a version of compressed
storage of strings.
More specifically, the classes of last two layers can include TrieSet
and TrieMap to give users more convenience. (The idea is inspired by the
patl http://code.google.com/p/patl/)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk