|
Boost : |
Subject: Re: [boost] [compond_file_binary] Gauging interest in a possible library submission.
From: Alexander Voitenko (tarmik_at_[hidden])
Date: 2012-12-05 04:00:26
> How does your intended solution differ from (or compare to) simply
> writing documents as a (possibly renamed) .zip file?
Yes, zip files without compression are similar to the compound files in some
way. They both provide some directory entries hierarchy inside and methods
to iterate through them, also they both provide acces to the stored data.
I am not expert in zip archives, so possible my suggestions are wrong.
But I can see several advantages over zip files:
- As I understand, zip files work sequentially: you put one file inside,
then another, then can delete some files. Only as whole chunks. But with
compound files you can open several streams and keep them opened: some for
writing, some for reading and make modifications simultaneously. Of course
this will lead to data fragmentation, but this is another story ;-) With
compound files you do not need to extract internal content to modify it, you
can do it "in place", even if you want to expand length of stored data by
writing new chunk at the end of some stream. Compound file's internal file
system provides all needed facilities to do such operations with minimal
cost.
- Faster entries searching. All child entries of one directory are organized
as red-black tree at format level. So entire directory entries hierarchy
look like a tree of red-black trees.
- Faster entries deletion. As I can see, zip files explicitly exclude
deleted files then recalculate CRC checksum. But compound files only mark
sectors for a deleted entry in the "Files Allocation Table" as "unused".
Yes, in common cases actual data is not removed and can be recovered by some
tools like hex editors. But this is similar to all file systems.
-- View this message in context: http://boost.2283326.n4.nabble.com/compond-file-binary-Gauging-interest-in-a-possible-library-submission-tp4639282p4639437.html Sent from the Boost - Dev mailing list archive at Nabble.com.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk