Boost logo

Boost Users :

Subject: Re: [Boost-users] [Boost][Serialization] CPU bottleneck
From: Vjekoslav Brajkovic (balkan_at_[hidden])
Date: 2008-09-14 18:44:12


On Sun, 14 Sep 2008, Robert Ramey wrote:

> I've reviewed the profile and found it interesting.
>
> Have you tried binary_archive. You would find it much, much, faster in
> this case for a variety of reasons.

After modifying the code to use binary_archive, everything runs as
expected. The reason why I posed this email was to make sure that this
is not a bug within the library.

> To maintain portability of text files, the library has to manipulate each
> character sent. This takes a lot of time and it adds up. You might
> experiment

I see. That explains everything.

> with creating a temporary array, wrapping in binary_obect and sending
> it that way. But still, the very fastest will be to use binary_?archive.

I was not aware of binary_object. Thanks for pointing this out. I will
consult the documentation

> Robert Ramey

Thanks a bunch for helping out in such short notice. I really appreciate
it.

Best! ;)

-vjeko

> Vjekoslav Brajkovic wrote:
>> Hi,
>>
>> I am using the serialization library in my project and it has been
>> functioning perfectly thus far. However, when I've scaled up the usage
>> requirements, I had hit a very odd problem.
>>
>> Let me explain the use case. Serialization library is used in a DFS
>> framework, handling large data structures (in terms of size, not
>> complexity) such as std::vector<char> of size 1MB and above. The
>> framework consists of two major components: Chunkserver (server) and a
>> Client. Files are chunked, wrapped in a class, serialized and sent
>> over the wire. Same things is done on the server side, but in a
>> reverse order. The actual binary data is stored in a vector
>> (previously, I've tried using string instead, but I had some issues
>> with it and Robert suggested using some an alternative container).
>>
>> When I was depositing large files to Chunkserver, disk utilization was
>> almost non-existent, whereas the CPU was maxed out. It is important to
>> realize that this problem occurred only on the server side, not
>> client.
>> Upon further investigation using gprof I have concluded that the
>> bottleneck was in the serialization library (it also may be the case
>> that I am misusing it). According to the profiler, above 97% of the
>> CPU time was spent in a singe function. Profiler results can be found
>> at this address:
>>
>> http://www.cs.washington.edu/homes/balkan/gprof.txt
>>
>> For the reference, the signature of that function is:
>> boost::serialization::serialize_adl<
>> boost::archive::text_iarchive, std::vector...>
>> and I am using text archive. I as mentioned before, this issue only
>> occurs on a server side.
>>
>> I would appreciate if anybody could explain why this is happening and
>> more importantly how to circumvent the issue.
>>
>> Thank you!
>>
>> Vjeko
>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net