Boost logo

Boost :

From: vicente.botet (vicente.botet_at_[hidden])
Date: 2008-03-12 16:40:25


Hi everyone,

I'm starting a library to manage Locally Unique Identifiers (LUID).
I'd like to ask if there is interest in a such library?

I haven't found on the net any reference to such a library. Please could you
point me to some libraries addressing this kind of problem or whatever could
be related to this domain, algorithms, ...

If there is an interest, I would like to know which are the minimal features
such
a library should have for a first review, and which one could be left for a
second one.

Best regards

_____________________
Vicente Juan Botet Escriba

Next follows my first thoughts:

Boost.LUID simplifies the generation of locally unique identifiers in a
configurable context.

Locally Unique Identifier or LUID is a special type of identifier used in
software applications in order to provide a reference number which is unique
in a given context (hence, "Locally" in opposition to "Globally" or
"Universally"). Each generated LUID is guaranteed to be unique in its
context.

LUID could have many applications. Some examples follow: creating unique
identifiers in a database tables, network messages may be identified with a
LUID to ensure that this messages are associated to a given session,
transactions may be identified by LUIDs.

Note. Please let me know other applications of LUIDs you can have.

Reduced identifier size
An attractive feature of LUIDs when compared to alternatives is the minimal
size used by this identifiers, that depends on the number of instances a
given context can have (usually 4 bytes). This allows to use them in
protocols fixing the size of the references exchanged to less that 16
bytes(as it is the case of UUID and GUID). Usually LUIDs are represented by
unsigned integers and have an upper bound value.

Usable as random access index in constant time
Another aspect, and not less interesting is that the access to the
information they identify can be done in constant time.

Recovering unused identifiers
A first implementation could consists in a monotonic counter. The drawback
of having a small size is that the number of unique identifier is more
limited. This limit will finish by been reached if the application runs a
long time. To paliate to this, we need a mechanism to recover the unused
identifiers once the identified information is removed. Depending on the
application the recovered identifiers should be discarded, reused
immediately or frozen during a given duration or a number of times.

Usually they are reused in a first reusable first reused order (fifo), but
depending on the implementation this could not be always the more efficient
policy. So the user can either constraint this fifo order or let undefined
this aspect.

Managing the lack of identifiers
In any case we need to manage with the overflow case, i.e. when there are no
more free identifiers. The user could prefer an exception, return an invalid
value, use errno, or call to a user defined function which could try to
recover from this situation, changing the upper bound for example.

Ensuring coherency on reusable identifiers
All this seams to work up to the user do not release the same identifiers
twice. This is an expensive task, but usually the applications has its own
way to do it. If this is not the case the application could ask to the luid
generator to ensure that.

Synchronization on multi threaded or multi process applications
The generation of new LUIDs and the recovering of unused LUIDs must be
synchronized. So LUID are better adapted to systems where these operations
are done in a reduced scope. Depending on this scope: single thread,
multi-threaded process or a node,i.e. multiple process, the synchronization
mechanisms vary. In a single thread scope no synchronization is needed.
Synchronization on a multi-threaded process can be ensured using a thread
mutex and so one.

In addition the synchronization can be ensured internally by the generator,
or externally by the user, i.e. the user has already a synchronization
mechanism.

LUIDs are not adapted to distributed system (multiple nodes generating and
recovering luids) because it needs a coordination between the different
nodes which will take too much time. UUIDs and GUIDs are more adapted to
this situation. Anyway if the size constraint can not be avoided, one of the
nodes could play the master role and the others behaves as slaves. Master
and slaves need to work together. This library do not pretend to take in
account this use case but, could be the base of such approach.

Persistency
Another aspect is the lifetime of the information to be identified by the
luids. It can be the lifetime of the process, the host machine lifetime or
persists to a shut down of the host machine using the filesystem.

Optimization
The last aspect will be the optimization. Different application have
different needs, some have space constraints, and mot of the others have
speed constraints.

One possible design could request to declare a variable for each luid
generator
and use functions like make and release.

typedef boost::luid::luidg<uint32, ...> X_luid_generator;
X_luid_generator x_luidg;

////
uint32 uid = luidg.make();

...
luidg.release(uid);

As the uid have a meaning only associated to the information it identifies,
usually
the use of this functions should be done implicitly when the information is
stored
on a container is removed.

typedef boost::luid::luidg<uint32> X_luid_mgr;
typedef my_multi_index<X, ..., luid_generator<X_luid_mgr> > X_container;

// construct the container, constructing a X_luid_mgr
X_container x_cont(...);
X x;
...

// inserts the element in the container and
// returns the new luid associated to the inserted element (luidg.make())
unisgned int uid = x_cont.insert(x);
// this uid could be shared with other applications

...

// later on the application use the uid to obtain the context
X* x=x_cont.get(uid);

// do something
...

// removes the element identified by uid and release the uid.
(luidg.release(uid))
x_cont.remove(uid);

This library could use the following Boost libraries
- Boost.DynamicBitSet to store the recovered luids
- Boost.Integer to obtain the default bounds
- Boost.Interprocess to lock on inter_node scope
- Boost.Intrusive to store the recovered luids in fifo order
- Boost.Interprocess to store on shared memory when the lifetime is longer
than the process one.
- Boost.Parameters to pass the constructor parameters
- Boost.Parameters to configure the template parameters
- Boost.Threads to lock on multi_threaded scope

I wonder if this library could be used to extend the Boost.MultiIndex
library with luid unique keys.

The number of user functions provided by LUID generator are quite simple:
* make: make a new luid (constant time complexity)
* release: release an luid (could check that the luid is not already free)
(constant time complexity without ensuring coherency)
* error: the value returned when no more luids are available (constant time
complexity)
* set_upper_bound: sets the new upper bound (could have linear time
complexity when coherency is required)
* get_upper_bound: get the current upper bound (constant time complexity)
* count: get the number of used luids (chould be constant time complexity if
a counter is used)
* clear: release all the used ressources and reset them as they were after
construction (could have linear time complexity)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk