Boost logo

Boost Users :

From: Robert Ramey (ramey_at_[hidden])
Date: 2007-03-10 02:54:22


Robert Ramey wrote:
> Bill Lear wrote:

>> I'm not sure I need to use a cache of strings when saving, in fact,
>> I think not, but I went ahead and more-or-less followed your
>> instructions.

I think I was thinking along the lines of the following

template<Archive>
void save(Archive &ar, ll_string & l, const unsigned int version) const {
    std::set<std::string>::iterator it = m_cache.find(l);
    if(m_cache.end() == it)
        m_cache.insert(l);
    // serialize a pointer to the cached value
    ar << & (* it); // or ar << it - serializing an iterator!!! that's a
first
}

void load(Archive &ar, ll_string & l, const unsigned int version){
    std::string * t;
    ar >> t; // warning or emitted by serialization library - try & or cast.
    // copy de-serialized string to "real" destination
    l = *t;
}

Thus, all the strings what are equal to each other would have
all have the same pointer. Serialization tracking ensures that
only one copy of the same object is in the archive. So when
you load, all the instances have thier contents shared. This
could make the file much smaller and faster to load. For
example suppose were going to serialize the bible. (King
James version). and we have it stored as a long list
of words. In this system all words would be stored only
once. + one small integer each time the word is stored
again. Given a guess of 10,000 different words used in
this 250,000 ? word work - the archive would be a lot
smaller. Also it might be considerably faster to load.

I'm not sure if this would really work - its just an idea
you might want to take this idea with a grain of salt.

Robert Ramey


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net