Image

Imagebadassgeek wrote in Imagelinux 🙂awake

Listens: Dead Kennedys - Moral Majority

messing with compressed archives

x-posted from my journal:

i was just thinking to myself that we need a better universal open/free compressed archive format. tar.gz is just an archive compressed; we need something like winzip for all the platforms, but obviously much better and not like winzip at all. what about gz.tar? it's not really what i had in mind, but think about it: instead of having to decompress all the files you could just read through the tar! it'd be a normally formatted tar - directories, symlinks, device files, etc. but normal files would be gzipped. then the bottleneck is no longer CPU, it's I/O!!! of course then you'd need a way to compress the device files and symlinks, etc. that are a part of tar and cannot be gzipped, but thats another story. if all you've got to archive and compress are directories of files, gz.tar should work to allow for quicker listing of files with similar compression ratios. the problem is that gzip adds a layer of overhead for each file it's applied to, so the gz.tar would be larger than the tar.gz almost every time.


tar.gz format:
Thu May  6 01:38:24 EDT 2004
uncompressed tar size: 11M
compressed tar.gz size: 2.4M
Thu May  6 01:38:33 EDT 2004

gz.tar format:
Thu May  6 01:41:51 EDT 2004
uncompressed directory size: 12M
compressed directory size: 4.6M
compressed tar size: 3.3M
Thu May  6 01:41:59 EDT 2004

do not trust the times; this was done during a recompile of gnucash. BUT, check this out:

listing archive - tar.gz format:
psypete@meatwad:~$ time tar -tzf tar.tar.gz >/dev/null
real    0m0.276s
user    0m0.250s
sys     0m0.020s

listing archive - gz.tar format:
psypete@meatwad:~$ time tar -tf gz.gz.tar >/dev/null
real    0m0.017s
user    0m0.020s
sys     0m0.000s



clearly listing the uncompressed archive is much, much faster. so if file compression isn't as important to you as speed in listing/extracting files, go with the gz.tar format.