Boost logo

Boost :

From: Nathan Myers (ncm_at_[hidden])
Date: 2005-05-04 22:18:29


On Wed, May 04, 2005 at 06:20:06PM +0100, Iain K. Hanson wrote:
> > > Nathan Myers wrote:
> > >
> > > > Another goal is a zero-copy streambuf whose buffer is an mmap
> > > > page that can be read into or written from without actually
> > > > copying any bytes from kernel to user space, or back.
>
> You will still at a minimum have another kernel to device copy as
> previously stated. Another problem is that mmap files *I think* need
> to be seek()able.

When speaking of zero-copy I/O, it is conventional not to count the
act of moving bits between the wire and memory. In principle, it's
true, one could conceive of operating on the bytes in real time
without ever storing them. However, most people start and stop
counting copies at the point where the data has landed in a kernel
buffer, ready to DMA to or from a device.

To mmap a file, it must be seekable, but that's not what I was
describing. On NetBSD as on Linux, if a page of memory has been
obtained via "anonymous" mmap, it is not actually mapping a file,
it's just an page of physical memory handed over to the caller
to write in, that may be returned to the system any time,
independently of any other page. (On some systems, e.g. Solarix,
you pretend to map /dev/zero, but that's just for tidiness.)

Under UVM, if you have a page or run of pages mapped, and pass a
pointer to the beginning of it to a system call, the kernel can
claim those physical memory pages and map them into kernel space
as regular buffers. Or, it can pick kernel buffer pages and expose
them to that range in your address space, in place of whatever was
there, all without copying bytes, What you see there is what the
kernel wants you to see. It looks as if it copied from its buffer
to yours, but you are really seeing its actual buffer. This is what
is normally described as zero-copy I/O.

It's quite an elegant way to rescue the apparently archaic read()
and write() model of I/O from ignominy. The only problem is that
fooling around with page maps can itself be quite expensive on a
multiprocessor system.

Nathan Myers
ncm_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk