Boost logo

Boost :

Subject: Re: [boost] [ boost ] [ Trie ]
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2015-03-08 05:27:55

2015-03-07 20:10 GMT+03:00 Cosmin Boaca <boost.cosmin.boaca_at_[hidden]>:

> > I have thought of this task too. I will share with you some ideas in
> order
> > to get some feedback about them.
> >
> > 1. Value = void specialization for trie_node which would remove pointers
> to
> > value_list_head/tail, self_value_count.
> >
> >+1
> I have done this task. Also I have done refactoring and cleanup on the
> whole project.

Good work.

> I have decided to change a couple of things:
> 1. I have removed all the methods from trie class that created / destroyed
> nodes. I have replaced them with calls to new and delete.

It's better to restore allocator as soon as possible. Just replace
new|delete with allocate/deallocate and implicit calls to

> 2. I have removed the node_allocator / value_allocator from trie class. I
> think something like a node factory that would handle only creation /
> destruction of nodes is more appropriate if we plan to allow custom
> allocators.

Yep, methods create_node/destroy_node are required

> 3. I have added some methods to the trie_node class in order to share a
> common interface between template specializations. Now operations that
> query the node internal state (adding / removing values / no_value()) is
> now done using those methods.


> 4. Because the iterator needed to be specialized too in order to handle
> trie_node specialization I have decided to delegate the erasure of the
> iterator to the iterator itself. This was the only way that could handle
> the strucutral difference in terms of members between the specializations.

That's a bad idea: iterators must know nothing about erasure. Convert those
__erase_self_value_node() methods into a free functions in detail
namespace. Those functions will be accepting allocator and will be
overloaded by node pointers.

> 5. The iterator class now has another template parameter called Enable. I
> needed to specialize the iterator for both void / const void and i used
> boost::enable_if in order to achieve this with less code. const void
> specialization is needed too because of the erase methods from trie. In
> trie class there are two erase methods, erase(iterator),
> erase(const_iterator). The const iterator coresonding for iterator<Key,
> void> is iterator<Key, const void>. If they were the same the erase method
> from trie would have been redefined resulting in compile time errors.


> 6. I have added more tests for trie_set. Now all the features are tested,
> find, remove, insert, count_prefix, copy, iterators.


> 7. A friend of mine has implemented a build system based on cmake.


> Why is the trie_node noncopyable ?
To avoid erroneous attempts to copy node. It's a structure with multiple
pointers and iterators that is not meant for copying.

I have described above all the changes that I needed to do in order to
> implement the template specialization. Also, I think now the node and
> iterator are abstracted better. Please give me some feedback about the
> changes that I have made [0].

Good work. Now let's concentrate on the struct trie_node<Key, void>.

Many optimizations may be applied there:
* key_ends_here is a boolean value, that actually could be hidden in one of
the pointers. In all cases our nodes are at least 2 bytes aligned, so all
the parent, pred_node, next_node guaranteed to have first bit set to 0.
so we can hide that `key_ends_here` bit in pointer just like this: pointer
| (key_ends_here ? 1u : 0u). This will probably require addition of
set_parent(). parent(), <...> functions to all the trie_nodes.

* children_type is not good. We get problems (like having additional
child_iter_of_parent in node), because the key is not stored in node. So
the plan is following: make trie_node to be usable in intrusive set, and
use the nodes directly in children. Here are some steps: use
boost::intrusive::set for children_type, derive trie_node from set_base_hook
and make it intrusive-set compatible, remove child_iter_of_parent and store
key in trie_node directly. This will significally reduce memory usage by
nodes and memory allocations (unlike intrusive set, std::map allocates
memory for nodes).

Both tasks are hard to implement, so do not hesitate to ask questions!

Best regards,
Antony Polukhin

Boost list run by bdawes at, gregod at, cpdaniel at, john at