Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-03-22 20:07:47


chintan rao wrote:
> I was thinking of implementing trie and various variation of tries, namely
> Ternary trees, Suffix trees and DAWG(Directed Acyclic Word Graphs) , for
> Google Soc.

That's interesting. Do you have a motivating application?

I last thought about tries when Boost.Flyweight was being discussed.
Could your tries be used there?

At one time I was considering writing an Apache log file analysis
program. This would have used some sort of trie or similar structure
to store the URLs (the aim is to be able to analyse very large logs in
RAM). At the time I was considering PATRICIA tries.

I also have some code that stores large numbers of very similar filenames:

/photos/christmas_2007/img_3277.jpg
/photos/christmas_2007/img_3278.jpg
....

These are currently stored in a binary file which is mmap()ed. The
time to read the file is non-trivial and is bad for program start-up
time. So perhaps I need a trie that can be stored in the mmap()ed
file, i.e. not using pointers, like the Boost.Interprocess containers do.

> I was thinking something like this
>
> This is the abstract of the interface for trie
> Interface:
> trie<type_2> something;
> trie[string key]=type2 value; // O(key.size());
> value trie::delete(string key); // O(key.size());
> trie::iterator trie::find(string key); // O(key.size())'
> int trie::size() // O(1) time. Gives the total no of elements in
> the trie
> trie::iterator trie::begin() //gives the beginning ie the
> lexicographically least key :)
> trie::iterator trie::end() //gives a iterator which is same as
> x=x+trie.size();
> string trie::get_key(iterator x) gets the key pointed to by iterator x;
>
> trie::iterator x; //worst case O(height of tree)
> x++; // traverses the tree left to right and goes to the "next" node
> x--; // traverses the tree right to left and goes to "prev" node
>
> *x // referrences the value of the key x is pointing to.
> trie.delete(x) // deletes the key pointed to by x;
>
> trie.count_prefix(string key) // counts the no of keys with prefix key
>
> Examples:
> trie::interator x;
> trie<vector<int> > t;
> t["hello"].push_back(9);
> t["eat"].push_back(10);
>
> x=t.find("hello");
> x--; //or --x; x now points to "eat" since eat comes before "hello"
> lexicographically;
> cout<<(*x)[0]<<endl;
>
> x++;//or ++x, x now points to "hello";
>
> x=t.end(); // or x++ will point to t.end().
>
> do
> {
> x--;
> int size=(*x).size();
> cout<<t.get_key(x)<<endl;
> for(int j=0;j<size;j++)
> cout<<(*x)[j]<<" ";
> cout<<endl;
> }
> while(x!=t.begin());
>
> Please suggest changes to the interface and other things.

I think you should be able to make it (almost) identical to std::map's
interface. You should certainly use std::map as your starting point,
and only deviate from it when you have a good reason to do so.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk