Boost logo

Boost :

Subject: Re: [boost] [endian] swap_in_place use case
From: vicente.botet (vicente.botet_at_[hidden])
Date: 2010-06-04 12:12:50


----- Original Message -----
From: "Dave Handley" <Dave.Handley_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Friday, June 04, 2010 5:47 PM
Subject: Re: [boost] [endian] swap_in_place use case

>
> Please don't drop attributions.

I did?
 
> Vicente Botet wrote:
>
>> Dave Handley wrote:
>>> Memory map a network endian file. Swap_in_place. Use.
>>>
>>> You definitely don't want a copy in this case since your file could easily be
>>> very big. Think of the case where your file is huge (say 10Gb), you really
>>> don't want to perform a copy and swap since that puts your memory need at 20Gb
>>> instead of 10Gb.
>
>> Yes this could be a use case. I don't use to manage so bigs files. If I had
>> to work with I will never copy the whole file. But I'm not sure that I will
>> use a swap_in_place of the whole file. This could take too much time. I would
>> try to split the task to do on the whole file in smaller parts.
>>
>> What will you do with this big file, that makes the swap in place the best choice?
>
> The file could be many things. It could be a day of market data for a given exchange. It could be image data or video data that I'm going to perform image analysis on (maybe run a filter over it, or something similar). The file doesn't even have to be that big. If I was memory mapping a 10MB file and needed to swap it, I wouldn't want to use 20MB instead of 10MB.

I'm not proposing to make a copy of the whole file.

> Pretty much anything I want to do to that file that involves looking at most or all of the data you would be much better off using swap in place instead of any copying swap implementation. Examples of the sorts of things that you might want to do to large files include running filters or normalisers over image or video files.

Couldn't the filter be adapted to the endianess of the file and work directly on the disk format?

> I have lots of programs that have multiple threads constantly memory mapping files that range in size from relatively small to hundreds of MB or low numbers of GB. Given that memory allocation is a key component of the run time of these programs, they would run significantly slower if I had to allocate double the amount of memory.

I repeat. I'm not proposing to make a copy of the whole file. Just seen if swap_in_place is the tool to apply in all the cases or if this is rstricted to some specific uses.

> Don't forget, if you need the whole file to be swapped, then the fastest way to do it will be a swap in place of the whole file.

For example if I have a file with records with for example some different fields and I want to count on a specific field, I don't need to swap the whole file. Iterating on the records and making the conversion of the specific field should be much more performant than making a swap_in_place of the whole file and then iterate on the records and use the specific field.

> I will reiterate something I said in an earlier post. If boost accepts an endian library which does not provide an efficient swap in place, I will be unable to use it. The library will end up in the list of boost libraries which are too inefficient to use in performance sensitive production code.

I understand. And I see that you need absolutely the swap_in_place of Tom's library.

Best,
Vicente


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk