Boost logo

Boost :

From: Pavel Vozenilek (pavel_vozenilek_at_[hidden])
Date: 2004-11-23 12:28:41


Hello Ion,

"ION_G_M" wrote:
> Thank you Pavel for your comments
>
My interest was raised because once upon a time
I implemented portable shared memory message queue
used for high-speed control system.

While I was quite proud of it, should I have
general purpose, working library I wouldn't waste
as much of time as I did back then.

My guess is that people would like very feature rich
(and still easy to use) shared memory module.

Maybe the best way to think about it is how it could be used.
I see few common scenarios:

1. As message queue between producer(s) and consumer(s),
   carying typed or untyped messages (with possible filtering
   of these mesages).

2. Appllication A starts, reads from shared memory,
    does something, updates it and quits.
   Some time later application B starts and does something
   else with the shared memory.

   The shared memory may actually rest in file
   in between.

3. Application 'A' is the main application and it exposes
    its internals for customization.
    'A' would work unchanged without shared memory.
    Then there are many small applications who can touch
    this or that part of 'A' internals.

4. Quick and dirty and suboptimal form of disk persistence.
    (Post++ on http://www.garret.ru/~knizhnik/
     is vaguely similar - it uses caching).

__________________________________________
>>4. Example in 3.2: the "alignement" parameter in
>> segment.create() isn't found in code.
>
> I can't find the error you mention, I've just downloaded the zip file
> and 3.2 does not have any example. In 4.2 segment.create is missing
> a ",".
>
Oops, example in 4.1, the line with
   8 /*alignment*/);

__________________________________________
>>5. Example in 4.2:
>> segment.named_new<MyType>
>> ("MyType instance", /*name of the object*/
>> 10 /*number of elements*/,
>> false /*if previously present, return error*/,
>> 0 /*ctor first argument*/,
>> 0 /*ctor second argument*/);
>
> a.It returns false. Exceptions are used only to indicate memory errors
> throwing bad_alloc. You are right there is info missing here. I will
> add more documentation in examples.
>
If I understand it correctly the function acts "like constructor".
Then there are two ways to report error:
 - bool return value if there's something with shmem
- exception from actual object constructor

Possible handling of this can get messy.

[two functions instead of bool parameter]
> b.If you find this approach more useful, I have no problem. I really
> don't like the boolean parameter, but I wanted to have a "find or
> create" functionality. If you like a find_or_named_new<>() additional
> function to indicate that approach I find it more clear than with a
> boolean parameters.
>
My complain is that in:

 segment.named_new<MyType> ("MyType instance", 10, false, 0 ,0);

the false doesn't give much of clue what is it all about.

[syntax with separated arguments]
> c.The syntax you propose is better, no doubt. I don't have experience
> with it so to implement this I suppose named_new<> should return an
> proxy object with overloaded operator()() functions. Is that right? If
> you want to help me I'm open.
>
Yes. Maybe the technique from object factory (library in Files section)
written by Robert Geiman could be used here.

> If boosters prefer throwing exceptions
> instead or returning false no problem here.
>
Me yes (explanation above).

It may be possible to create overload
   bool b = segment.named_new<MyType, std::nothow>(.....)
when one doesn't like/use exceptions.

> A problem I see is that my
> interface allows creating an array like new[]. Do you consider this
> necessary? You prefer a different function? Maybe the proxy object
> should have an operator[] that can be used to specify array allocation?
>
I would prefere separate function
   named_new_array<type>(array_count)(....)

It would save one parameter where it is not needed.

__________________________________________
>> std::pair< MyType *, std::size_t> res =
>> segment.find_named_new< MyType > ("MyType instance");
>>Why do I need the "size"? Doesn't a type have
>>always the same size regardless?
>
> Size contains the number of elements in case you allocate an array. with
>
> segment.named_new<MyType>
> ("MyType instance", /*name of the object*/
> 10 /*number of elements*/, ...
>
> you allocate an array of 10 elements. so it will return 10.
>
Maybe separate functions could be used:

a)

type* = segment.find_named_new<MyType>(....)

which would thor if you ask for array

b)

type* = segment.find_named_new<MyType, std::nothrow>(....)

which would return 0 if nothing is found OR data are array.

c)

type* = segment.find_named_array_new<MyType>(....)
type* = segment.find_named_new<MyType, std::nothrow>(....)

with similar behavior

d)

bool = segment.is_array<MyType>(...)

Btw maybe the names could be
    find_named_object<...>(...)
etc.

__________________________________________
>>6. Could you use namespace shmem_detail or so
>> instead of "detal" to avoid possible clashes?
> No problem. I've seen detail namespace in some projects so I thought it
> was not a problem. detail namespace is inside boost::shmem namespace so
> it wouldn't be necessary. You find it necessary even if detail
> namespace is really boost::shmem::detail?
>
My mistake. I had misread it as boost::detail.

__________________________________________
>>8. offset_ptr.hpp: full_offset_ptr class
>>b) The flag could be eliminated completely.
>
> The offset indicates the distance between the pointee and the this
> pointer of offset_ptr, so m_offset == 0 indicates a pointer pointing to
> itself and this is quite common in STL containers when empty, since
> next pointer in the end node points to end node, resulting in a
> m_offset = 0. Obviously this is different from a null pointer. If I
> change the meaning of m_offset to offset from the beginning of the
> segment I need the base address stored somewhere (and the base address
> is different in each process), so that I can convert from offset_ptr<A>
> to A* using get() or the constructor.
>
Hmm, maybe something as (ptrdiff_t)-1 or if there's some symbolic
name for such value.

If one has a lot of pointers it could make difference (and it could be
also passed in one register as function parameter).

__________________________________________
>>9. Maybe the protection of mutext from shared ptr
>> lib could be worked around
>
> I don't understand your point.
>
My misunderstanding.

Maybe process-wide mutexes etc should be pushed
into Boost.Thread rather than here.

__________________________________________
>>12. The simple algorithm to find fitting memory
>> block may not be adequate for high-performance
>> apps (who are most likely to use shared
>> memory).
>
> You are right. The default algorithm is space-friendly, which I thought
> it was more important than performance for fixed size segments. You can
> write your own algorithm and use it since shmem_alloc is a typedef of
> basic_shmem_alloc<default_algo>. If you prefer another algo like
> segregated lists, I can try to write it, so that the user can choose
> the allocation algorithm. I've written the pooled allocator due to
> default algorithm slowness.
>
Maybe if the library has interface to plug in different
algorithm (no clue now how it would look like).

__________________________________________
>>13. Can be be possible to identify named objects
>> with something else than C string? Say
>> wchar_t* or numeral or other templated type?
>
> Do you think that the key type should be templatized?
>
Well, this is Boost ;-)

> I think that an integer key can speed up a lot searches but I have to
> think about which classes should be templatized. When storing other
> type of strings (for example std::string, I would need to build an
> allocator for strings in shared memory but also a std::string since
> it's probable that current STL won't work with Shemem STL allocator).
> The key meaning would be different since right now, I copy the string
> to shared memory but with configurable key type things are more
> complicated.
>
I think it is not critical though nice to have.
Maybe there could be limitation on types allowed as keys.
const char* and integers should cover 90%.

__________________________________________
>>a) avoiding shmem specific containers/mutexes/etc
>> as much as possible.
>
> I think you can't avoid mutexes if you want to guarantee atomic memory
> allocation, since I have no skills to write a lock-free memory
> allocator.
>
This reminds me, newest STLport beta has lock-free allocator
inside (I know just this, no more details).

> Regarding to containers, it was no my intention to write
> them, but I needed some of them to store name-buffer mappings and a
> node container to test the pooled allocator in several systems. For
> now, I have only succeed using shmem STL allocators in a modified
> Dinkum STL. STLport and libstc++ use suppose allocator::pointer to be a
> raw pointer, so I can't use STL containers. I've chosen internal
> containers to be public because I find them very useful, but if this is
> not accepted they can be used only for internal uses and removed from
> documentation.
>
Maybe the library could list those offending STLs and instruction how
to fix them manually to be useable with shmem. Such a fix should
not change behaviour of applications, right?

__________________________________________
>>b) ability to "equalize" shared memory features
>
> I would need some help in this because my operating system knowledge is
> very limited. Mimic-ing UNIX way in windows can be very difficult, I
> think, unless you use a mapped file. I would need some serious help
> here.
>
The solution I used was to have extra process keeping shared
memory alive on Windows. On Unix a process can periodicall
check shmem usage and destroy it if needed.

I guess the library could just have interface to plug in a helper
process, something like:
    errno = segment::use_helper_process(exe_filename);
Shmem could detect whether such process already runs
and exec() it if it doesn't. The process could shut itself down
when it finds it is no longer needed.

But I think such feature isn't very critical and could
be done outside library.

__________________________________________
>>d) support for "transactions": I would like to
>
> My knowledge in transaction world is null so I can't help you with
> that. I suppose that a shared memory condition variable should be very
> interesting to notify events to other processes, but I'm afraid this is
> a work for more skilled programmers than me (people from
> boost::threads, perhaps?). As far as I know in windows is difficult to
> implement a shared memory condition variable, pthreads-win32 does no
> support it and I don't know how cygwin solves this.
>
What I mean is roughly this (just idea):

- transactions with isolation level 1:
    All changes to shmem are cached in normal memory.
    When user questions shmem he will get the cached
    not yet comitted data data (I think it should work
    because of offset_ptr<>).

    Effect of transactions from other apps would be visible.
    When transaction gets comitted all data are written at
    one into shmem. It should be possible to revert it back when
    somethink goes wrong (e.g. by saving old copy from shmem).

  - transactions with isolation level 2:
     The data from shmem are copied into buffer when transaction
     starts and are used during transaction duration.
     Otherwise it is the same as above.

Having transactions would add *very* useful feature
to shmem (for scenarios [2] and [3]),
feature quite hard to implement right.

It would be still possible for user to design his
own specific transaction mechanism.

Tables library (something as in-memory relational
database, in Files section), written by Arkadyi Vertelyeb
has transaction capability. Maybe its interface could
be reused.

__________________________________________
When I am at this, one can think about more features of shmem
  (if these things are technically feasible):

- C language binding (only the most primitive functionality
   as get_data_block/put_data_block). This would allow
   apps written in other languages interoperate
   with shmem (up to some point). Not everyone has C++
   compiler or is willing to use it or dare to use Boost.

- The shmem segment could have following functionality

   - fixed location/any location in memory

   - fixed size/expandable/expandable and shrunkable

  - keep when not used/not keep when not used
    (as suggested above)

   - function to compact data inside

   - function to report how much memory is available/max
     free block size

   - something as debug mode switched either at runtime
     or compile time - with sentinel guards, erasing freed memory etc.

     I guess commonly used memory checkers
     may not work with shared memory.

  - create_shmem_from_file(const char* file, unsigned begin_offset,
             unsigned size)

  - get_shmem_OS_handle()

   - events to report shmem is getting exhausted:
     segment::report_exhausted(boost::function handler,
          unsigned high_threshold);
    Possibly similar mechanism as set_new_handler of new.

- debug mode where offset_ptr<> contains pointer to shmem
  segment and verifies it doesn't point outside.

- some data inside shmem may carry with itself complete
   information how to construct itself
   (DLL name + identification string for object factory there).

  Shmem could read this info, load DLL and return
  result to application. (lsomethinmg as primitive
  component system).

  It would (1) keep most of class functionality
  in just one place and (2) if data structures change
  it could keep applications running w/o upgrade.

/Pavel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk