Boost logo

Boost Users :

Subject: Re: [Boost-users] Compress to buffer
From: Kenneth Adam Miller (kennethadammiller_at_[hidden])
Date: 2014-01-14 14:24:05


Ah ok. I apologize for posting too fast. I will be sure to go further in
exhausting all resources before I post in the future. However, I did find
some of the descriptions of the iostreams parts and how they compose
(especially examples, that's what I was really looking for) vague, IMHO.

No, I haven't measured performance. And I understand well your concern with
inappropriate attention to performance. The general advice is make it work
first, then make it work faster; I agree with this advice strongly, but I
think you might agree with me if I explain my context. If I see a
particular solution that I believe will more likely guarantee lower latency
due to unnecessary reallocations and re-initializations (as when a vector
expands and must make all memory that it contains contiguous), I will try
to eliminate it in the process of making it work the first time. I'm
writing a pintool, and you just kind of have to understand what that is in
order to know where I'm coming from. Intel's PIN tool suite is a dynamic
binary instrumentation framework designed to allow programmers to specify
callbacks at different levels of granularity, as well as write their own
instrumentation functions. So if you wanted to gather data about every
image load, or instruction execution, or routine call, on the fly you could
do it.

Anyway, the context in which I am using this compression utility is a
particularly important one for speed. A set of dynamic analysis routines
steadily generate data from the target program that is being analyzed, and
the point where I'm asking about now has to do with where, once the
analysis threads have generated enough data, they report a handle to their
buffer to a pool of compression threads; the compression threads drop the
data they've been handed directly into a fresh buffer (ideally) that goes
through a compressor. The problem is, because I'm instrumenting at
instruction level granularity, my analysis code, between target application
threads has to synchronize a lamport clock (now I'm really getting out of
hand with my explanation lol); so, the whole program might produce a 400mb
xml file, and these buffers get flushed at the frequency of accumulating
about 30kb. So that's a lot of flushing! Those unnecessary allocation
copies would, in the worst case, result in a 2-3x slowdown of the entire
program due to the fact that it's literally pausing to repeat something
that's unnecessary.

But actually, if this appends my 30k string, I didn't know it. The
back_insert_device<string> object wasn't something that I understood very
well from the description at
http://www.boost.org/doc/libs/1_45_0/libs/iostreams/doc/classes/back_inserter.html

>From what I had thought though, if my string was empty with 0's, and I
appended to it, then I would end up with precisely my compressed data. The
idea of acquireStringFromPool is that it returns a large string that is
always cleared out for the filtering_ostream to fill up as though it were a
device. Is that incorrect? How can I get that functionality?

Perhaps, however, that reserve approach would be good. I apologize the
email was so long lol.

On Tue, Jan 14, 2014 at 12:16 PM, Krzysztof Czainski <1czajnik_at_[hidden]>wrote:

> 2014/1/14 Kenneth Adam Miller <kennethadammiller_at_[hidden]>
>
>> By the way, I'm working on a master's thesis, so I frequently skip sleep.
>> Sometimes after a lack of sleep, getting across precisely what is
>> needed/understood can take some an iteration or two :)
>>
>
> Good luck with your thesis ;-)
>
> Btw, I think top-posting is inappropriate on this list.
>
> On Tue, Jan 14, 2014 at 10:15 AM, Kenneth Adam Miller <
>> kennethadammiller_at_[hidden]> wrote:
>>
>>> Pretty much on performance concerns. I know that there's at least going
>>> to be one copy performed while doing the compression, from uncompressed to
>>> compressed. Here's how I do it:
>>>
>>> filtering_ostream *fos = new filtering_ostream();
>>> fos->push(bzip2_compressor());
>>> string *x = acquireStringFromPool(); //This is just a blocking pointer
>>> return that reaches into a list of string *, each that are allocated with
>>> new string(30000,0); (it's multithreaded, ok lol :) )
>>> fos->push(boost::iostreams::back_insert_device<string>(x)); //This is
>>> what I was searching for all along.
>>>
>>
> Doesn't this append your 30k string?
>
> If a preallocated chunk of memory is ok for you, check out
> vector::reserve() ;-)
>
> then later, when I want to write to fos I do,
>>>
>>> *fos << *doc; //I go straight from container to compression.
>>>
>>> Maybe my specifications that "I don't want to find that it's copying at
>>> all" were a bit weird, because obviously it has to move the data right? I'm
>>> just saying that most of the examples I would see would be something like
>>>
>>> compress(string x) {
>>> stringstream ss(x); //unnecessary initialization in my case,
>>> couldn't find an example without this
>>> //something similar to what I did...
>>> }
>>>
>>
> OK, so did you measure performance of your solution compared to the above
> example? And then to some version with std::vector with reserved memory?
>
> My suggestions have nothing to do with compression and streams. But you
> might be optimizing prematurely here.
>
> HTH,
> Kris
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net