7z
7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser [General Public License]. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 25.01.
The 7z file format specification is distributed with 7-Zip's source code since 2015. The specification can be found in plain text format in the "doc" sub-directory of the source code distribution.
Features and enhancements
The 7z format provides the following main features:- Open, modular architecture that allows any compression, conversion, or encryption method to be stacked.
- High compression ratios.
- AES-256 bit encryption.
- Zip 2.0 Encryption
- Large file support.
- Unicode file names.
- Support for solid compression, where multiple files of similar type are compressed within a single stream, in order to exploit the combined redundancy inherent in similar files.
- Compression and encryption of archive headers.
- Support for multi-part archives: e.g. xxx.7z.001, xxx.7z.002,....
- Support for custom codec plugin DLLs.
Compression methods
The following compression methods are currently defined:- LZMA – A variation of the LZ77 algorithm, using a sliding dictionary up to 4 GB in length for duplicate string elimination. The LZ stage is followed by entropy coding using a Markov chain–based range coder and binary trees.
- LZMA2 – modified version of LZMA providing better multithreading support and less expansion of incompressible data.
- Bzip2 – The standard Burrows–Wheeler transform algorithm. Bzip2 uses two reversible transformations; BWT, then move to front with Huffman coding for symbol reduction.
- PPMd – Dmitry Shkarin's 2002 PPMdH and cPPMII ) with small changes: PPMII is an improved version of the 1984 PPM compression algorithm.
- DEFLATE – Standard algorithm based on 32 kB LZ77 and Huffman coding. Deflate is found in several file formats including ZIP, gzip, PNG and PDF. 7-Zip contains a from-scratch DEFLATE encoder that frequently beats the de facto standard zlib version in compression size, but at the expense of CPU usage.
Pre-processing filters
The LZMA SDK comes with the BCJ and BCJ2 preprocessors included, so that later stages are able to achieve greater compression: For x86, ARM, PowerPC, IA-64 Itanium, and ARM Thumb processors, jump targets are "normalized" before compression by changing relative position into absolute values. For x86, this means that near jumps, calls and conditional jumps are converted from the machine language "jump 1655 bytes backwards" style notation to normalized "jump to address 5554" style notation; all jumps to 5554, perhaps a common subroutine, are thus encoded identically, making them more compressible.- BCJ – Converter for 32-bit x86 executables. Normalises target addresses of near jumps and calls from relative distances to absolute destinations.
- BCJ2 – Pre-processor for x86-64 executables. BCJ2 is an improvement on BCJ, adding additional x86 jump/call instruction processing. Near jump, near call, conditional near jump targets are split out and compressed separately in another stream.
- BCJ-type filters for ARM64, ARM32, ARM-Thumb, PowerPC, SPARC.
- Delta encoding – delta filter, basic preprocessor for multimedia data.
- Swap2/Swap4 – endianess-swap filter.
Encryption
The 7z format provides the option to encrypt the filenames of a 7z archive.
Limitations
The 7z format does not store filesystem permissions, and hence can be inappropriate for backup/archival purposes. A workaround on UNIX-like systems for this is to convert data to a tar bitstream before compressing with 7z. But GNU tar can also compress with the LZMA2 algorithm natively, without the use of 7z, using the "-J" switch. The resulting file extension is ".tar.xz" or ".txz" and not ".tar.7z". This method of compression has been adopted with many distributions for packaging, such as Arch, Debian, Fedora and Slackware. On the other hand, it is important to note, that tar does not save the filesystem encoding, which means that tar compressed filenames can become unreadable if decompressed on a different computer.The 7z format does not allow extraction of some "broken files"—that is if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive—it must wait until all segments are downloaded. The 7z format also lacks recovery records, making it vulnerable to data degradation unless used in conjunction with external solutions, like parchives, or within filesystems with robust error-correction. By way of comparison, zip files also lack a recovery feature while the rar format has one.