Minuimus

Minuimus is a file optimiser utility script: You point it at a file, and it makes the file smaller without compromising the file contents. File optimisers are able to do this using a variety of file-specfic optimisations, most of which involve decompressing data within a compressed file and recompressing it in a more efficient manner. The process could be compared directly to extracting a ZIP file, then recompressing it at the most demanding setting. It uses many of the same techniques used by other optimisers such as Papa's Best Optimizer.

The latest version is Minuimus 4.1. If running on Windows, the bundled windows dependencies will also be required. On linux almost all of the dependencies are in most distro repositories, and the remaining ones are optional.

Minuimus itsself does not carry out all of this optimisation alone. It is dependent upon many other utilities for this, as well as some more specialised methods developed especially for it. Minuimus is a script which automates the process of calling upon all of these utilities, including the process of recursively processing container files and ensuring proper reassembly, detecting and handling the various errors that may occur, and of running some level of integrity checking upon the optimised files to prevent damage.

Minuimus's optimisations are, by default, completely transparent: Not a single pixel of an image will be changed in value, no audio or video will fall in quality. Even metadata is preserved unaltered. It also supports a number of forms of format-converting or lossy optimisation, which must be explicitly enabled by command line option.

This optimisation isn't actually lossless, strictly speaking, because may delete some essentially-useless information within files such as edit history in PDFs. But it's pretty close. The important information is all preserved, if you use the default settings.

As Minuimus entirely automates the file optimisation process, nothing more is needed than to install the prerequsites and run a single command to point Minuimus at the files to be optimised. It will optimise those files it can, and skip over those it cannot.

Minuimus consists of a perl script and three supporting (optional) binaries which are used for the processing of PDF, WOFF and SWF files. These are written for use on Ubuntu Linux, but should be adaptable to other Linux distributions with little if any alteration. Minuimus will also work on Windows using Strawberry Perl, but has not been subjected to as much testing. These utilities are released under the GPL v3 license, as is Minuimus itsself.

Minuimus goes beyond just calling on AdvanceCOMP. When faced with files which are zip containers - such as zip, epub, or docx - it will extract these files, recursively process all of the files contained within them, and put them back together. In this manner it can make e-books and office documents substantially smaller.

Use it to make your website faster, your game easier to distribute, or just to squeeze more pointless holiday photographs on to your computer. It's written on ubuntu linux, but should be adaptable to other distributions. This is a very capable program, as it is able to achieve an ongoing saving in storage and transmission with no cost beside a one-time need for some processor time.

The exact space saving achieved by this level of file optimisation is highly dependent upon the file being optimised. As is expected for any file optimiser, even after extensive testing, the results are too inconsistent to easily quantify. A collection of PDF files sampled from the-eye.eu was reduced in size to 90% of the input, while a half-terabyte sample taken from the archive.org 'computermagazine' collection was reduced with greater success to 78% of the input size. A collection of epub files from Project Gutenberg was reduced to a mere 95%, as these files are light on images, and ZIP files with no files inside which can be recursively optimised are reduced only very slightly, typically to around 97%.

If the utility Leanify is installed it will be automatically invoked on each file. As Leanify has some tricks that Minuimus does not, and vice versa, the two in conjunction can achieve better compression than either alone.

Minuimus.pl script (V 4.2)
Windows dependencies.

Supported file types

Transparent optimisation

Non-transparent conversion (Disabled unless explicitly specified)

Description of processing

Images

JPEG files initially processed by jpegoptim. For most JPEG files, this is as much as is possible. In some rare cases Minuimus may find a color JPEG which contains only grayscale values, in which case the empty color channels will be removed for a further reduction in size.

PNG files are processed by optipng, followed by advpng - unless they are animated pngs, as advpng is not animation-safe. Those get advdef instead. Optionally pngout will be called if installed on non-animated PNGs, which sometimes saves another percent or so.

GIF files are processed by gifsicle. If the file is less than 100KiB and flexigif is installed, it is then processed by flexigif.

TIFF files are re-compressed on highest setting supported by imagemagick.

Format conversion is disabled by default, but if enabled will convert GIF and PNG to WebP in an animation-safe manner, and convert JPG to AVIF in manner that is lossy but only very slightly so. This can also be enabled for older image formats like BMP or PCX.

Archives

JAR files are processed by advzip, as these files are too delicate to be safely manipulated beyond this. In particular, altering the files within would invalidate any signing.

PDF files are initially processed by using qpdf to remove unused objects and object versions, ensure a consistent format and correct any minor errors. Following this pre-processing, all JPEG objects within the PDF are located and processed using jpegoptim. All objects compressed using DEFLATE are also identified and processed using a C helper binary, minuimus_def_helper - if this in not installed, DEFLATE processing is skipped. Certain unimportant internal metadata objects are deleted, but the main document metadata is not touched unless --discard-meta is specified. The processed PDF is then relinearised using qpdf. As a fail-safe against anything going wrong in this process, both original and compressed PDFs are rendered into bitmap format and compared - the original will only be overwritten if the optimised file rendered identically. This combination of qpdf, jpeg and DEFLATE processing makes Minuimus one of the most effective lossless PDF optimisation utilities available. Finally, if pdfsizeopt is installed, it will be called - as it includes some optimisations that minuimus lacks, and vice versa, the two tools together can outperform either alone.

ZIP, DOCX, XLSX, ODT, ODS, ODP, EPUB and CBZ, the zip-derived archive formats, are processed by extracting to a temporary folder, processing all (non-archive) files within as independent files, then re-compressing into a ZIP file and compressing that with advzip. The 'mimetype' file is correctly placed as the first in the completed file using store-only compression in accordance with the EPUB OCF. Additionally a number of 'junk' files such as Thumb.db and .ds_store are deleted from ZIP files.

For CBZ files, any GIF, BMP, PCX or TIFF files within are converted to PNG (In an animation-safe manner) in addition to the standard ZIP processing. The capability to convert PNGs into WebP is present, but default disabled due to limited viewer support for WebP. JPEG-WebP is also supported (this is slightly lossy) but must likewise be expressly enabled.

GZ and TGZ files just get processed by advdef.

7Z archives are extracted and constituent files processed, then recompression attempted using both LZMA and PPMd compression algorithms on highest practical settings. Whichever of these produces the smallest file is kept, unless the original file is smaller. Solid compression is not used.

Default settings will not change the type of an archive, but optionally ZIP and RAR files can be converted to 7z, which will provide (much) better compression than ZIP and (slightly) better compression than RAR. There is also an option to convert 7Z to zpaq, which is around the best compression around - but also has so little support it isn't usually practical to use. In all cases the conversion will be rejected if the resulting file is larger than the original.

Media files

MP3 files will be repacked if that makes them smaller. Usually it doesn't - but some very old MP3s are poorly packed and might get a tiny bit smaller. There is an option to convert high-bitrate MP3 to Opus, and an option that will permit this at much lower bitrate intended for processing voice recordings.

FLAC files are re-encoded using the highest possible profile-complient settings - slightly better than the regular -9. Metadata is preserved. FLAC files are also examined to determine if they contain a mono audio track encoded as stereo. Such files are converted to true mono, which achieves a substantial space saving.

WOFF files (web fonts) are processed using a Zopfli-based recompresser.

If the option is specified, files using legacy codecs (such as WMV or DivX) can be converted to AV1-encoded WebM files. This process will automatically include any SRT subtitle files in to the WebM if they share the same filename. This conversion will be rejected and un-done if it results in a larger file.

All operations that re-encode audio will also check for 'fake stereo' files, where the two channels are identical, and convert these to true mono.

Other files

CAB - the MS CAB, not Installshield CAB - will be repackaged if possible, but the savings from this are very small. Signed cabs are ignored.

HTML, CSS and SVG files are also searched for any base64-encoded JPG, PNG or WOFF resources, and these resources optimised appropriately.

SWF files will have their internal JPEG and PNG objects recompressed, and the outer DEFLATE wrapper run through Zopfli. However, as most SWF generation software is already focused on producing small files, savings are generally small.

STL models in ASCII form will be converted to binary form. This makes them a lot smaller - but most STLs are already binary now. Optionally, explicit normals can be removed - this makes the files more compressible, though no longer strictly standard-compliant. I've yet to find a program that is affected by removing the vertex normals - they are really just a relic of the format's early origins, and I am not aware of any program that actually makes use of this data.

CHM files - compressed images mostly used by emulators - are recompressed at a larger, more efficient chunk size.