LZ4 (compression algorithm)

Last updated

Original author(s) Yann Collet
Developer(s) Yann Collet
Initial release24 April 2011 (2011-04-24)
Stable release
1.9.4 [1]   OOjs UI icon edit-ltr-progressive.svg / 16 August 2022;17 months ago (16 August 2022)
Repository
Written in C
Operating system Cross-platform
Platform Portable
Type Data compression
License Simplified BSD License
Website lz4.org OOjs UI icon edit-ltr-progressive.svg
LZ4 Frame Format
Magic number 04 22 4d 18 [2]
Type of format Data compression
Website https://github.com/lz4/lz4/blob/master/doc/lz4_Frame_format.md

LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed. It belongs to the LZ77 family of byte-oriented compression schemes.

Contents

Features

The LZ4 algorithms aims to provide a good trade-off between speed and compression ratio. Typically, it has a smaller (i.e., worse) compression ratio than the similar LZO algorithm, which in turn is worse than algorithms like DEFLATE. However, LZ4 compression speed is similar to LZO and several times faster than DEFLATE, while decompression speed is significantly faster than LZO. [3]

Design

LZ4 only uses a dictionary-matching stage (LZ77), and unlike other common compression algorithms does not combine it with an entropy coding stage (e.g. Huffman coding in DEFLATE). [4] [5]

The LZ4 algorithm represents the data as a series of sequences. Each sequence begins with a one-byte token that is broken into two 4-bit fields. The first field represents the number of literal bytes that are to be copied to the output. The second field represents the number of bytes to copy from the already decoded output buffer (with 0 representing the minimum match length of 4 bytes). A value of 15 in either of the bitfields indicates that the length is larger and there is an extra byte of data that is to be added to the length. A value of 255 in these extra bytes indicates that yet another byte is to be added. Hence arbitrary lengths are represented by a series of extra bytes containing the value 255. The string of literals comes after the token and any extra bytes needed to indicate string length. This is followed by an offset that indicates how far back in the output buffer to begin copying. The extra bytes (if any) of the match-length come at the end of the sequence. [6] [7]

Compression can be carried out in a stream or in blocks. Higher compression ratios can be achieved by investing more effort in finding the best matches. This results in both a smaller output and faster decompression.

Implementation

The reference implementation in C by Yann Collet is licensed under a BSD license. There are ports and bindings in various languages including Java, C#, Rust, and Python. [8] The Apache Hadoop system uses this algorithm for fast compression. LZ4 was also implemented natively in the Linux kernel 3.11. [9] The FreeBSD, Illumos, ZFS on Linux, and ZFS-OSX implementations of the ZFS filesystem support the LZ4 algorithm for on-the-fly compression. [10] [11] [12] [13] Linux supports LZ4 for SquashFS since 3.19-rc1. [14] LZ4 is also supported in newer zstd command line utility by Yann Collet.

Related Research Articles

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

bzip2 File compression software

bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting.

LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations including LZW, LZSS, LZMA and others. Besides their academic influence, these algorithms formed the basis of several ubiquitous compression schemes, including GIF and the DEFLATE algorithm used in PNG and ZIP.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman) on 900 kB output chunks.

vmlinux Executable file containing the Linux kernel

vmlinux is a statically linked executable file that contains the Linux kernel in one of the object file formats supported by Linux, which includes Executable and Linkable Format (ELF) and Common Object File Format (COFF). The vmlinux file might be required for kernel debugging, symbol table generation or other operations, but must be made bootable before being used as an operating system kernel by adding a multiboot header, bootsector and setup routines.

Lempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed.

Snappy is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.

Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77, that was created in 1982 by James A. Storer and Thomas Szymanski. LZSS was described in article "Data compression via textual substitution" published in Journal of the ACM.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

The following tables compare general and technical information for a number of file systems.

zram, formerly called compcache, is a Linux kernel module for creating a compressed block device in RAM, i.e. a RAM disk with on-the-fly disk compression. The block device created with zram can then be used for swap or as general-purpose RAM disk. The two most common uses for zram are for the storage of temporary files and as a swap device. Initially, zram had only the latter function, hence the original name "compcache".

F2FS is a flash file system initially developed by Samsung Electronics for the Linux kernel.

<span class="mw-page-title-main">OpenZFS</span> Open-source implementation of the ZFS file system

OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Solaris operating system and now maintained by the OpenZFS Project. It supports features like data compression, data deduplication, copy-on-write clones, snapshots, and RAID-Z. It also supports the creation of virtual devices, which allows for the creation of file systems that span multiple disks.

LZFSE is an open source lossless data compression algorithm created by Apple Inc. It was released with a simpler algorithm called LZVN.

Zstandard is a lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C, released as open-source software on 31 August 2016.

EROFS is a lightweight read-only file system initially developed by Huawei for the Linux kernel and now maintained by an open-source community from all over the world.

842, 8-4-2, or EFT is a data compression algorithm. It is a variation on Lempel–Ziv compression with a limited dictionary length. With typical data, 842 gives 80 to 90 percent of the compression of LZ77 with much faster throughput and less memory use. Hardware implementations also provide minimal use of energy and minimal chip area.

References

  1. "Release v1.9.4".
  2. Collet, Yann. "LZ4 Frame Format Description". GitHub . Retrieved 7 October 2020.
  3. Michael Larabel (28 January 2013). "Support For Compressing The Linux Kernel With LZ4". Phoronix . Retrieved 28 August 2015.
  4. Collet, Yann (30 March 2019). "LZ4 Block Format Description". GitHub. Retrieved 9 July 2020. There is no entropy encoder back-end nor framing layer.
  5. DEFLATE Compressed Data Format Specification version 1.3. IETF. doi: 10.17487/RFC1951 . RFC 1951 . Retrieved 9 July 2020.
  6. Yann Collet (26 May 2011). "RealTime Data Compression" . Retrieved 28 August 2015.
  7. ticki (25 October 2016). "How LZ4 works" . Retrieved 29 June 2017.
  8. Extremely Fast Compression algorithm http://www.lz4.org on GitHub
  9. Jonathan Corbet (19 July 2013). "Kernel development". LWN.net . Retrieved 28 August 2015.
  10. "FreeBSD 9.2-RELEASE Release Notes". FreeBSD. 13 November 2013. Retrieved 28 August 2015.
  11. "LZ4 Compression". illumos. Archived from the original on 9 October 2018. Retrieved 28 August 2015.
  12. Illumos #3035 LZ4 compression support in ZFS and GRUB on GitHub
  13. "Features: lz4 compression". OpenZFS . Retrieved 28 August 2015.
  14. Phillip Lougher (27 November 2014). "Squashfs: Add LZ4 compression configuration option" . Retrieved 28 August 2015.
  15. 7-zip-zstd