Original author(s)	Jeff Dean, Sanjay Ghemawat, Steinar H. Gunderson
Developer(s)	Google
Initial release	March 18, 2011 (2011-03-18)

Stable release	1.1.10 / March 9, 2023; 15 months ago (2023-03-09)^[1]

Repository	github.com/google/snappy
Written in	C++
Operating system	Cross-platform
Platform	Portable
Size	1.1 MiB
Type	data compression
License	Apache 2 (up to 1.0.1)/New BSD
Website	google.github.io/snappy/

Snappy framed^[2]
Filename extension	.sz
Internet media type	application/x-snappy-framed
Magic number	`ff 06 00 00 73 4e 61 50 70 59` (`FF 06 00 00 "sNaPpY"`)
Type of format	Data compression
Open format?	yes
Free format?	yes
Website	github.com/google/snappy/blob/main/framing_format.txt

Snappy (previously known as Zippy) is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011.^[3]^[4] It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.^[5]

Snappy is widely used in Google projects like Bigtable, MapReduce and in compressing data for Google's internal RPC systems. It can be used in open-source projects like MariaDB ColumnStore,^[6] Cassandra, Couchbase, Hadoop, LevelDB, MongoDB, RocksDB, Lucene, Spark, InfluxDB,^[7] and Ceph.^[8] Firefox uses Snappy to compress data in localStorage.^[9] Decompression is tested to detect any errors in the compressed stream. Snappy does not use inline assembler (except some optimizations^[10]) and is portable.

Stream format

Snappy encoding is not bit-oriented, but byte-oriented (only whole bytes are emitted or consumed from a stream). The format uses no entropy encoder, like Huffman coding or arithmetic coding.

The first bytes of the stream are the length of uncompressed data, stored as a little-endian varint,^[11]^{: section 1} which allows for use of a variable-length code. The lower seven bits of each byte are used for data and the high bit is a flag to indicate the end of the length field.

The remaining bytes in the stream are encoded using one of four element types. The element type is encoded in the lower two bits of the first byte (tag byte) of the element:^[12]

00 – Literal – uncompressed data; upper 6 bits are used to store length (len-1) of data. Lengths larger than 60 are stored in a 1-4 byte integer indicated by a 6 bit length of 60 (1 byte) to 63 (4 bytes).
01 – Copy with length stored as 3 bits and offset stored as 11 bits; one byte after tag byte is used for part of offset;
10 – Copy with length stored as 6 bits of tag byte and offset stored as two-byte integer after the tag byte;
11 – Copy with length stored as 6 bits of tag byte and offset stored as four-byte little-endian integer after the tag byte;

The copy refers to the dictionary (just-decompressed data). The offset is the shift from the current position back to the already decompressed stream. The length is the number of bytes to copy from the dictionary. The size of the dictionary was limited by the 1.0 Snappy compressor to 32,768 bytes, and updated to 65,536 in version 1.1.

The complete official description of the snappy format can be found in the google GitHub repository.^[11]

Example of a compressed stream

The text

Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project.

may be compressed to this, shown as hex data with explanations:

000000 51 f0 42 57 69 6b 69 70 65 64 69 61 20 69 73 20  >Q.BWikipedia is <
000010 61 20 66 72 65 65 2c 20 77 65 62 2d 62 61 73 65  >a free, web-base<
000020 64 2c 20 63 6f 6c 6c 61 62 6f 72 61 74 69 76 65  >d, collaborative<
000030 2c 20 6d 75 6c 74 69 6c 69 6e 67 75 61 6c 20 65  >, multilingual e<

The stream starts with the length of the uncompressed data as a varint^[11]^{: section 1} – thus the first byte, with the high bit clear, corresponds to a length of 51₁₆=81 bytes.

The first block must be a literal, and f042 corresponds thereto: the first byte is broken down as f0₁₆ ⇒ len−1=111100₂;type=00₂; type 0 signifies a literal, and a length−1 of 111100₂=60 means the length is read from the following byte, in this case 42₁₆=66. The first 66 bytes of the text ("Wikipedia is a free, web-based, collaborative, multilingual encyclo") follow.^[11]^: 2.1

000040 6e 63 79 63 6c 6f 09 3f 1c 70 72 6f 6a 65 63 74  >ncyclo.?.project<
000050 2e                                               >.<

The next block's header consists of 093f, broken down as 09₁₆ ⇒ off_h=000₂,len−4=010₂;type=01₂: type 1 indicates a "copy with 1-byte offset": the length to copy works out to 010₂+4=6 bytes, and the offset is an 11-bit integer whose top bits are off_h and whose low bits are the next byte: 3f, so {off_h}{3f₁₆}=00000111111₂=63.^[11]^{: 2.2,2.2.1}

This means to copy 6 bytes, starting 63 bytes ago – since 67 bytes have already been copied this evaluates to copying 6 bytes starting at position 4 (from the fifth byte), which produces "pedia ".

This block has no other content, and thus the following block starts immediately after – 1c₁₆ ⇒ len−1=000111₂;type=00₂, i.e. a literal of length 000111₂+1=8.^[11]^: 2.1 The final part of the text ("project.") follows.

In this example, all common substrings with four or more characters were eliminated by the compression process. More common compressors can compress this better. Unlike compression methods such as gzip and bzip2, there is no entropy encoding used to pack alphabet into the bit stream.

Framing format

The Snappy stream supports inputs with an overall size of up to 4GiB−1,^[11]^{: section 1} and may add significant overhead to sections which are not or insufficiently compressed, as well as not being self-identifying, and having no data integrity mechanism beyond a simple output size check.

To combat these issues, the Snappy framing format^[2] "Snappy framed" may be used, which breaks the input into chunks of up to 64KiB,^[2]^: 4.2,4.3 delimited by 4-byte block headers (a one-byte identifier and three-byte length):^[2]^{: section 1}

the "Stream identifier", with type FF₁₆, must start the stream, and must consist exclusively of "sNaPpY" in ASCII,^[2]^: 4.1
the "Compressed data", with type 0, contains a compressed Snappy stream,^[2]^: 4.2
the "Uncompressed data", with type 1, contains data to copy to the output verbatim.^[2]^: 4.3

Both types of data chunk also contain a CRC-32C checksum of the uncompressed data.

Chunks of types 2-7F₁₆ are reserved and must result in errors.^[2]^: 4.5 Those of types 80₁₆-FE₁₆ may be ignored by the decompressors which do not understand them.^[2]^: 4.4,4.6

Interfaces

Snappy distributions include C++ and C bindings. Third party-provided bindings and ports include^[13] C#, Common Lisp, Crystal (programming language), Erlang, Go, Haskell, Lua, Java, Nim, Node.js, Perl, PHP, Python, R, Ruby, Rust, Smalltalk, and OpenCL.^[14]^[15] A command-line interface program is also available.^[16]

References

External links

Data compression methods

Lossless

Entropy type	Adaptive coding Arithmetic Asymmetric numeral systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Unary Universal Exp-Golomb Fibonacci Gamma Levenshtein
Dictionary type	Byte pair encoding Lempel–Ziv 842 LZ4 LZJB LZO LZRW LZSS LZW LZWL Snappy
Other types	BWT CTW CM Delta Incremental DMC DPCM Grammar Re-Pair Sequitur LDCT MTF PAQ PPM RLE
Hybrid	LZ77 + Huffman Deflate LZX LZS LZ77 + ANS LZFSE LZ77 + Huffman + ANS Zstandard LZ77 + Huffman + context Brotli LZSS + Huffman LHA/LZH LZ77 + Range LZMA LZHAM bzip2 (RLE + BWT + MTF + Huffman)

Lossy

Transform type	Discrete cosine transform DCT MDCT DST FFT Wavelet Daubechies DWT SPIHT
Predictive type	DPCM ADPCM LPC ACELP CELP LAR LSP WLPC Motion Compensation Estimation Vector Psychoacoustic

Audio

Concepts	Bit rate ABR CBR VBR Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Silence compression Sound quality Speech coding Sub-band coding
Codec parts	A-law μ-law DPCM ADPCM DM FT FFT LPC ACELP CELP LAR LSP WLPC MDCT Psychoacoustic model

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image Texture compression
Methods	Chain code DCT Deflate Fractal KLT LP RLE Wavelet Daubechies DWT EZW SPIHT

Video

Concepts	Bit rate ABR CBR VBR Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality
Codec parts	DCT DPCM Deblocking filter Lapped transform Motion Compensation Estimation Vector Wavelet Daubechies DWT

Theory

Community

Hutter Prize
Global Data Compression Competition
encode.su

People

Matt Mahoney
Mark Adler

Data compression software

Archivers with
compression
(comparison)

Free software	7-Zip Ark Expander FreeArc GNOME Archive Manager Info-ZIP KGB Archiver PAQ pax PeaZip XAD (decompression only) Xarchiver Zipeg ZPAQ
Freeware	Filzip LHA Lhasa (decompression only) StuffIt Expander (decompression only) The Unarchiver (decompression only) TUGZip ZipGenius
Commercial	ARC ALZip Archive Utility ARJ BetterZip MacBinary PKZIP/SecureZIP PowerArchiver StuffIt WinAce WinRAR WinZip

Non-archiving
compressors

Generic	bzip2 compress gzip lzip lzop pack rzip Snappy XZ Utils zstd
For code	UPX

Audio
compression
(comparison)

Lossy	AAC Fraunhofer FDK AAC Nero AAC Codec FAAC Helix DNA Producer MP3 l3enc LAME TooLAME libavcodec libcelt libopus libspeex Musepack libvorbis Windows Media Encoder
Lossless	ALAC FLAC libavcodec Monkey's Audio mp4als OptimFROG Shorten WavPack L2HC

Video
compression
(comparison)