Boost logo

Boost Users :

Subject: [Boost-users] [Boost][Serialization] CPU bottleneck
From: Vjekoslav Brajkovic (balkan_at_[hidden])
Date: 2008-09-14 16:58:07


Hi,

I am using the serialization library in my project and it has been
functioning perfectly thus far. However, when I've scaled up the usage
requirements, I had hit a very odd problem.

Let me explain the use case. Serialization library is used in a DFS
framework, handling large data structures (in terms of size, not
complexity) such as std::vector<char> of size 1MB and above. The
framework consists of two major components: Chunkserver (server) and a
Client. Files are chunked, wrapped in a class, serialized and sent over
the wire. Same things is done on the server side, but in a reverse order.
The actual binary data is stored in a vector (previously, I've tried
using string instead, but I had some issues with it and Robert suggested
using some an alternative container).

When I was depositing large files to Chunkserver, disk utilization was
almost non-existent, whereas the CPU was maxed out. It is important to
realize that this problem occurred only on the server side, not client.

Upon further investigation using gprof I have concluded that the
bottleneck was in the serialization library (it also may be the case
that I am misusing it). According to the profiler, above 97% of the CPU
time was spent in a singe function. Profiler results can be found at
this address:

http://www.cs.washington.edu/homes/balkan/gprof.txt

For the reference, the signature of that function is:
boost::serialization::serialize_adl<
         boost::archive::text_iarchive, std::vector...>
and I am using text archive. I as mentioned before, this issue only
occurs on a server side.

I would appreciate if anybody could explain why this is happening and
more importantly how to circumvent the issue.

Thank you!

Vjeko


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net