I recently encountered a zip file that could be recursively extracted infinitely, which got me intrigued in how one could achieve something like that. I thus dedicated my time this hackathon to learning about data compression and quines, in hopes I could create a compressed archive file (i.e. .zip, .tar.gz).

I learnt about the use of:

  • Lemepl-Ziv algorithms
  • Huffman coding
  • DEFLATE compression specification
  • Checksums
  • Quirks in file binary data (little-endian, headers and footers, etc.)

All of these account for the necessary items to understanding how popular data archives are stored and made.

In the end, I was nearly there, but the problem of having recursive checksums meant I needed to do some fancy algebra to find a CRC value that would equate appropriately in both the header and data section of an archive file. And so I ran out of time and couldn't get it working completely :(

I am however very happy with the progress I made and will attempt to create a streamlined solution, like a Python package that adds a layer of abstraction to the "recursive archive problem" that can be then applied to the specifics of the provided by file types like zip and rar.

Built With

  • deflate
Share this project:

Updates