messing with compressed archives
x-posted from my journal:
i was just thinking to myself that we need a better universal open/free compressed archive format. tar.gz is just an archive compressed; we need something like winzip for all the platforms, but obviously much better and not like winzip at all. what about gz.tar? it's not really what i had in mind, but think about it: instead of having to decompress all the files you could just read through the tar! it'd be a normally formatted tar - directories, symlinks, device files, etc. but normal files would be gzipped. then the bottleneck is no longer CPU, it's I/O!!! of course then you'd need a way to compress the device files and symlinks, etc. that are a part of tar and cannot be gzipped, but thats another story. if all you've got to archive and compress are directories of files, gz.tar should work to allow for quicker listing of files with similar compression ratios. the problem is that gzip adds a layer of overhead for each file it's applied to, so the gz.tar would be larger than the tar.gz almost every time.
tar.gz format:
gz.tar format:
do not trust the times; this was done during a recompile of gnucash. BUT, check this out:
listing archive - tar.gz format:
listing archive - gz.tar format:
clearly listing the uncompressed archive is much, much faster. so if file compression isn't as important to you as speed in listing/extracting files, go with the gz.tar format.
i was just thinking to myself that we need a better universal open/free compressed archive format. tar.gz is just an archive compressed; we need something like winzip for all the platforms, but obviously much better and not like winzip at all. what about gz.tar? it's not really what i had in mind, but think about it: instead of having to decompress all the files you could just read through the tar! it'd be a normally formatted tar - directories, symlinks, device files, etc. but normal files would be gzipped. then the bottleneck is no longer CPU, it's I/O!!! of course then you'd need a way to compress the device files and symlinks, etc. that are a part of tar and cannot be gzipped, but thats another story. if all you've got to archive and compress are directories of files, gz.tar should work to allow for quicker listing of files with similar compression ratios. the problem is that gzip adds a layer of overhead for each file it's applied to, so the gz.tar would be larger than the tar.gz almost every time.
tar.gz format:
Thu May 6 01:38:24 EDT 2004 uncompressed tar size: 11M compressed tar.gz size: 2.4M Thu May 6 01:38:33 EDT 2004
gz.tar format:
Thu May 6 01:41:51 EDT 2004 uncompressed directory size: 12M compressed directory size: 4.6M compressed tar size: 3.3M Thu May 6 01:41:59 EDT 2004
do not trust the times; this was done during a recompile of gnucash. BUT, check this out:
listing archive - tar.gz format:
psypete@meatwad:~$ time tar -tzf tar.tar.gz >/dev/null real 0m0.276s user 0m0.250s sys 0m0.020s
listing archive - gz.tar format:
psypete@meatwad:~$ time tar -tf gz.gz.tar >/dev/null real 0m0.017s user 0m0.020s sys 0m0.000s
clearly listing the uncompressed archive is much, much faster. so if file compression isn't as important to you as speed in listing/extracting files, go with the gz.tar format.
