Ah ok. I apologize for posting too fast. I will be sure to go further in exhausting all resources before I post in the future. However, I did find some of the descriptions of the iostreams parts and how they compose (especially examples, that's what I was really looking for) vague, IMHO.  

No, I haven't measured performance. And I understand well your concern with inappropriate attention to performance. The general advice is make it work first, then make it work faster; I agree with this advice strongly, but I think you might agree with me if I explain my context. If I see a particular solution that I believe will more likely guarantee lower latency due to unnecessary reallocations and re-initializations (as when a vector expands and must make all memory that it contains contiguous), I will try to eliminate it in the process of making it work the first time. I'm writing a pintool, and you just kind of have to understand what that is in order to know where I'm coming from. Intel's PIN tool suite is a dynamic binary instrumentation framework designed to allow programmers to specify callbacks at different levels of granularity, as well as write their own instrumentation functions. So if you wanted to gather data about every image load, or instruction execution, or routine call, on the fly you could do it.

Anyway, the context in which I am using this compression utility is a particularly important one for speed. A set of dynamic analysis routines steadily generate data from the target program that is being analyzed, and the point where I'm asking about now has to do with where, once the analysis threads have generated enough data, they report a handle to their buffer to a pool of compression threads; the compression threads drop the data they've been handed directly into a fresh buffer (ideally) that goes through a compressor. The problem is, because I'm instrumenting at instruction level granularity, my analysis code, between target application threads has to synchronize a lamport clock (now I'm really getting out of hand with my explanation lol); so, the whole program might produce a 400mb xml file, and these buffers get flushed at the frequency of accumulating about 30kb. So that's a lot of flushing! Those unnecessary allocation copies would, in the worst case, result in a 2-3x slowdown of the entire program due to the fact that it's literally pausing to repeat something that's unnecessary. 

But actually, if this appends my 30k string, I didn't know it. The back_insert_device<string> object wasn't something that I understood very well from the description at http://www.boost.org/doc/libs/1_45_0/libs/iostreams/doc/classes/back_inserter.html

From what I had thought though, if my string was empty with 0's, and I appended to it, then I would end up with precisely my compressed data. The idea of acquireStringFromPool is that it returns a large string that is always cleared out for the filtering_ostream to fill up as though it were a device. Is that incorrect? How can I get that functionality?

Perhaps, however, that reserve approach would be good. I apologize the email was so long lol.


On Tue, Jan 14, 2014 at 12:16 PM, Krzysztof Czainski <1czajnik@gmail.com> wrote:
2014/1/14 Kenneth Adam Miller <kennethadammiller@gmail.com>
By the way, I'm working on a master's thesis, so I frequently skip sleep. Sometimes after a lack of sleep, getting across precisely what is needed/understood can take some an iteration or two :)

Good luck with your thesis ;-) 

Btw, I think top-posting is inappropriate on this list.

On Tue, Jan 14, 2014 at 10:15 AM, Kenneth Adam Miller <kennethadammiller@gmail.com> wrote:
Pretty much on performance concerns. I know that there's at least going to be one copy performed while doing the compression, from uncompressed to compressed. Here's how I do it:

filtering_ostream *fos = new filtering_ostream();
fos->push(bzip2_compressor());
string *x = acquireStringFromPool();   //This is just a blocking pointer return that reaches into a list of string *, each that are allocated with new string(30000,0); (it's multithreaded, ok lol :) )
fos->push(boost::iostreams::back_insert_device<string>(x));   //This is what I was searching for all along.

Doesn't this append your 30k string?

If a preallocated chunk of memory is ok for you, check out vector::reserve() ;-) 

then later, when I want to write to fos I do,

*fos << *doc;    //I go straight from container to compression.

Maybe my specifications that "I don't want to find that it's copying at all" were a bit weird, because obviously it has to move the data right? I'm just saying that most of the examples I would see would be something like

compress(string x) {
   stringstream ss(x);  //unnecessary initialization in my case, couldn't find an example without this
   //something similar to what I did...
}

OK, so did you measure performance of your solution compared to the above example? And then to some version with std::vector with reserved memory?

My suggestions have nothing to do with compression and streams. But you might be optimizing prematurely here.

HTH,
Kris

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users