Boost logo

Boost :

Subject: Re: [boost] Interest in a simple buffer abstraction?
From: Cory Nelson (phrosty_at_[hidden])
Date: 2011-01-30 18:34:31


On Sat, Jan 29, 2011 at 2:59 PM, Vicente Botet <vicente.botet_at_[hidden]> wrote:
>
>
> Cory Nelson wrote:
>>
>> On Thu, Jan 27, 2011 at 6:49 AM, Vicente Botet <vicente.botet_at_[hidden]>
>> wrote:
>>> I'm working on a frame concept, which is something like a bidirectional
>>> buffer. It allows to prepend and append binary data, and obtain/remove
>>> the
>>> header and the trailer.
>>>
>>> In addition we can split a frame in fragments and join fragments to make
>>> a
>>> frame.
>>>
>>> All these features are needed when working with protocol stacks.
>>
>> You'll have a winner if you expand that to allow bidirectional
>> iteration of both bytes and fragments (and then a fragment is just a
>> raw array giving random-access iteration within them).  VERY useful
>> for high-performance incremental parsing, I would definitely give my
>> +1 for boost inclusion.
>>
>
>
> I wanted to hide whether the memory is continuous or not. Only when the user
> needs continuous memory the library will ensure it on the resulting data. Of
> course this continuous memory will have random access.
>
> Could you give an example of how you want to use the features you seem to
> need? This will help me to see if the current interface is enough or if it
> must be extended.

I've designed a similar buffer class before, with I/O and incremental
parsing in mind. In addition to what you describe, it also had
reserve(size_t), commit(size_t), release(size_t), and access to the
reserved area. I'm still not 100% sure if this type of access is
appropriate for a general buffer, but allowing it would not cost
anything to those who aren't using it and it would reduce a ton of
code duplication for a specific io::buffer, since they're basically
the same but with the internals a bit more exposed.

1) Incremental parsing.

enum result { ok, need_more };

template<typename Iterator>
result parse(Iterator first, Iterator last);

I can call parse() on the first fragment (just using pointers for
iterators) for high performance. Only if the first fragment doesn't
contain enough to parse (returning need_more) will I call parse() on
the entire buffer, which will be slower (like dequeue iterators) but
rare. If that returns need_more still, then I'll perform more I/O to
fill the buffer with more data. Think of, eg. HTTP or XML, where the
individual bits (like a header or <element>) are quite small but
you'll want to read in large page-sized fragments to reduce I/O calls.
 A single fragment will hold a lot of things and there's no reason to
use slow iterators to get them.

2) Scatter/gather I/O.

I will typically have a pool of page-sized fragments. A slow stream
might only work on one or two fragments at a time, but for a fast one
I might want to reduce I/O calls and work with a lot of fragments at
once. If a stream is coming in fast, I can reserve() 64KB worth of
fragments, request a scatter into the reserved area, then commit()
whatever amount of data it actually gave me. Whatever didn't get
committed will still be usable for the next I/O. I then release()
however much data I was able to parse.

This can be implemented pretty cleanly with hierarchical iterators, like so:

class fragment
{
   T* begin();
   T* end();
};

class iterator
{
   fragment& fragment();
};

class buffer
{
   iterator begin(); // go over committed range only.
   iterator end();

   iterator reserved_begin(); // go over reserved
(uninitialized/unconstructed) range only.
   iterator reserved_end();

   iterator to_iterator(fragment &f, T* iter); // creates a slower
iterator from a fragment iterator.

   size_type size(); // committed size.
   size_type reserved_size();
   size_type capacity(); // size() + reserved_size().

   size_type reserve(size_type minsize); // ensures there are at least
'minspace' elements reserved, and returns the new reserved_size().
   size_type commit(size_type size); // commits 'size' elements from
the reserved space, returns the new size().
   size_type release(size_type size); // releases 'size' elements from
committed space, returns the new size().
};

-- 
Cory Nelson
http://int64.org

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk