Boost logo

Boost :

Subject: Re: [boost] [endian] swap_in_place use case
From: Dave Handley (Dave.Handley_at_[hidden])
Date: 2010-06-04 18:07:57


Vicente Botet wrote:

> Couldn't the filter be adapted to the endianess of the file and work directly on the disk format?

Surely this violates the most important principle of endian swapping - that you should always work in native endian and swap on the boundaries. If I specifically wrote the algorithm to work on the disk endianess I would need to adapt the algorithm to cope with both big and little endian situations; the algorithm would change depending on which platform I compiled on; and every person coding on the algorithm would need to code to be endian aware. This would be a maintenance nightmare.

I'm not sure I've been totally clear on what I mean by a filter. I mean a filter in the signal processing sense of the word. So let's set up a problem where we are reading a 10000x10000 image with 4 byte integer gray scale at each pixel and uncompressed (so 400Mb file). This is in big endian format. What I then want to do is convolve a number of kernel with the original image to perform various image transformation. The kernels I'm going to use are, say, the following (Sobel kernels):

[ 3 10 3 ]
[ 0 0 0 ]
[ -3 -10 -3 ]

[ 3 0 -3 ]
[ 10 0 -10 ]
[ 3 0 -3 ]

And maybe a few more. Each will be convolved with the image, and since I'm using this for feature detection, all I want to do is create a list of points over a given (absolute) threshold. So I'm creating a resulting data object that is much smaller than my input object.

Now here, if I swap on use, then if I run this code on a little endian machine I will end up swapping everything 9 times per 3x3 kernel that I convolve. If I swap up front and then convolve, on a big endian machine I will end up using twice the memory (and copying the whole source file).

If on the other hand I swap in place, on both little and big endian machines I have an efficient algorithm.

My pseudo-code for this is going to be:

Swap in place version:
1) Memory map big file
2) Endian swap in place
3) For each filter: Convolve filter with image, and output results

Swap and copy version:
1) Memory map big file
2) Copy entire big file whilst endian swapping
3) For each filter: convolve with copy of image, and output results

Swap on use version:
1) Memory map big file
2) For each filter: convolve with disk-endian image, hence for each pixel in image swap pixel and 8 surrounding pixels, multiply given 9 pixels by the filter and output if over threshold. End up swapping each pixel 9 times.

Terry Golubiewski wrote:

> I disagree. I have demonstrated that...
>
> 1) endian-on-access can be zero cost in the native-case.
> 2) swapping in place is not any safer because the C++ type system cannot
> help to determine which portions of an object have been swapped, nor
> document the endian properties of a data structure. endian<big, double>
> can, and should, be defined to provided portable floating-point transfer and
> persistent storage.
>

I, in turn, also disagree. I'm not quite sure what you did demonstrate with your test case. If by "endian-on-access" you mean copying during the endian swap, then I don't think it is ever zero cost since a copy is always required. It is the same cost as the swap/swap-in-place code if and only if your code requires a copy to take place at some point, at which point both cases require the cost of a copy. To be honest, because your test case didn't separate reading data in from performing the swap, I was really unsure what information it actually conveyed. I would agree with another commenter (Robert I think) who asked for some timing code to be placed around the actual endian swapping to properly time how long the swap took on its own.

I don't think the issue with endian swapping code is "safety" since you should really be endian swapping at the boundaries (when the data is read into the machine you should be converting to machine endian) and that will always be "safe". Actually, the big problem I have with the endian types is that you require 2 versions of every structure you are reading in for swapping (one version with endian types and one without); and that you are tempted to defer endian swapping until very late in the code, which means that you are no longer endian swapping at the boundaries. I would really dislike debugging a program that carried around data in the wrong endianness and only swapped it when used. I actually suspect the most common use case of endian types would be as a really short lived temporary storage for the wrong endian data from disk/network/elsewhere, and as such I don't see what you gain with a "type-safe" interface.

Dave Handley


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk