Boost logo

Boost :

Subject: Re: [boost] [ boost ] [ Trie ]
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2015-03-08 05:27:55


2015-03-07 20:10 GMT+03:00 Cosmin Boaca <boost.cosmin.boaca_at_[hidden]>:

> > I have thought of this task too. I will share with you some ideas in
> order
> > to get some feedback about them.
> >
> > 1. Value = void specialization for trie_node which would remove pointers
> to
> > value_list_head/tail, self_value_count.
> >
> >+1
>
> I have done this task. Also I have done refactoring and cleanup on the
> whole project.

Good work.

> I have decided to change a couple of things:
>
> 1. I have removed all the methods from trie class that created / destroyed
> nodes. I have replaced them with calls to new and delete.
>

It's better to restore allocator as soon as possible. Just replace
new|delete with allocate/deallocate and implicit calls to
constructor/destructor.

> 2. I have removed the node_allocator / value_allocator from trie class. I
> think something like a node factory that would handle only creation /
> destruction of nodes is more appropriate if we plan to allow custom
> allocators.
>

Yep, methods create_node/destroy_node are required

> 3. I have added some methods to the trie_node class in order to share a
> common interface between template specializations. Now operations that
> query the node internal state (adding / removing values / no_value()) is
> now done using those methods.
>

Ok

> 4. Because the iterator needed to be specialized too in order to handle
> trie_node specialization I have decided to delegate the erasure of the
> iterator to the iterator itself. This was the only way that could handle
> the strucutral difference in terms of members between the specializations.
>

That's a bad idea: iterators must know nothing about erasure. Convert those
__erase_self_value_node() methods into a free functions in detail
namespace. Those functions will be accepting allocator and will be
overloaded by node pointers.

> 5. The iterator class now has another template parameter called Enable. I
> needed to specialize the iterator for both void / const void and i used
> boost::enable_if in order to achieve this with less code. const void
> specialization is needed too because of the erase methods from trie. In
> trie class there are two erase methods, erase(iterator),
> erase(const_iterator). The const iterator coresonding for iterator<Key,
> void> is iterator<Key, const void>. If they were the same the erase method
> from trie would have been redefined resulting in compile time errors.
>

Ok

> 6. I have added more tests for trie_set. Now all the features are tested,
> find, remove, insert, count_prefix, copy, iterators.
>

+1

> 7. A friend of mine has implemented a build system based on cmake.
>

+1

> Why is the trie_node noncopyable ?
To avoid erroneous attempts to copy node. It's a structure with multiple
pointers and iterators that is not meant for copying.

I have described above all the changes that I needed to do in order to
> implement the template specialization. Also, I think now the node and
> iterator are abstracted better. Please give me some feedback about the
> changes that I have made [0].

Good work. Now let's concentrate on the struct trie_node<Key, void>.

Many optimizations may be applied there:
* key_ends_here is a boolean value, that actually could be hidden in one of
the pointers. In all cases our nodes are at least 2 bytes aligned, so all
the parent, pred_node, next_node guaranteed to have first bit set to 0.
so we can hide that `key_ends_here` bit in pointer just like this: pointer
| (key_ends_here ? 1u : 0u). This will probably require addition of
set_parent(). parent(), <...> functions to all the trie_nodes.

* children_type is not good. We get problems (like having additional
child_iter_of_parent in node), because the key is not stored in node. So
the plan is following: make trie_node to be usable in intrusive set, and
use the nodes directly in children. Here are some steps: use
boost::intrusive::set for children_type, derive trie_node from set_base_hook
and make it intrusive-set compatible, remove child_iter_of_parent and store
key in trie_node directly. This will significally reduce memory usage by
nodes and memory allocations (unlike intrusive set, std::map allocates
memory for nodes).

Both tasks are hard to implement, so do not hesitate to ask questions!

-- 
Best regards,
Antony Polukhin

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk