JBIG2
Internet media type	image/x-jbig2
Developed by	Joint Bi-level Image Experts Group
Latest release	2
Contained by	Portable Document Format, FAX
Standard	ITU T.88 & ISO/IEC 14492

JBIG2 is an image compression standard for bi-level images, developed by the Joint Bi-level Image Experts Group. It is suitable for both lossless and lossy compression. According to a press release^[1] from the Group, in its lossless mode JBIG2 typically generates files 3–5 times smaller than Fax Group 4 and 2–4 times smaller than JBIG, the previous bi-level compression standard released by the Group. JBIG2 was published in 2000 as the international standard ITU T.88,^[2] and in 2001 as ISO/IEC 14492.^[3]

Functionality

Ideally, a JBIG2 encoder will segment the input page into regions of text, regions of halftone images, and regions of other data. Regions that are neither text nor halftones are typically compressed using a context-dependent arithmetic coding algorithm called the MQ coder. Textual regions are compressed as follows: the foreground pixels in the regions are grouped into symbols. A dictionary of symbols is then created and encoded, typically also using context-dependent arithmetic coding, and the regions are encoded by describing which symbols appear where. Typically, a symbol will correspond to a character of text, but this is not required by the compression method. For lossy compression the difference between similar symbols (e.g., slightly different impressions of the same letter) can be neglected; for lossless compression, this difference is taken into account by compressing one similar symbol using another as a template. Halftone images may be compressed by reconstructing the grayscale image used to generate the halftone and then sending this image together with a dictionary of halftone patterns.^[4] Overall, the algorithm used by JBIG2 to compress text is very similar to the JB2 compression scheme used in the DjVu file format for coding binary images.

PDF files versions 1.4 and above may contain JBIG2-compressed data. Open-source decoders for JBIG2 are jbig2dec^[5] (AGPL), the java-based jbig2-imageio^[6] (Apache-2), the JavaScript-based jbig2.js^[7] (Apache-2), and the decoder by Glyph & Cog LLC found in Xpdf and Poppler^[8] (both GPL). An open-source encoder is jbig2enc^[9] (Apache-2).

Technical details

Typically, a bi-level image consists mainly of a large amount of textual and halftone data, in which the same shapes appear repeatedly. The bi-level image is segmented into three regions: text, halftone, and generic regions. Each region is coded differently and the coding methodologies are described in the following passage.

Text image data

Text coding is based on the nature of human visual interpretation. A human observer cannot tell the difference between two instances of the same characters in a bi-level image even though they may not exactly match pixel by pixel. Therefore, only the bitmap of one representative character instance needs to be coded instead of coding the bitmaps of each occurrence of the same character individually. For each character instance, the coded instance of the character is then stored into a "symbol dictionary".^[10] There are two encoding methods for text image data: pattern matching and substitution (PM&S) and soft pattern matching (SPM).^[11]

Block diagrams of (left) pattern matching and substitution method and (right) soft pattern matching method

Pattern matching and substitution (PM&S) is the more classic coding method. The encoder performs image segmentation to isolate character-sized chunks. For each individual chunk, the encoder looks for a match in the bitmap dictionary. If a match exists, we code an index of the corresponding representative bitmap in the dictionary and the position of the character on the page. The position is usually relative to another previously coded character. If a match is not found, the segmented pixel block is coded directly and added into the dictionary. Typical procedures of pattern matching and substitution algorithm are displayed in the left block diagram of the figure above. Although the method of PM&S can achieve outstanding compression, substitution errors could be made during the process if the image resolution is low.^[11]

JBIG2 improves on PM&S with optional soft pattern matching (SPM). The same segmentation and searching is performed, but for each found match, the encoder saves not only the corresponding dictionary entry, but also refinement data describing the difference between the actual chunk and the dictionary chunk. Doing so greatly reduces substitution errors.^[10]^[a] Since the dictionary match requires that the actual character and the dictionary character are highly similar, SPM only adds a tiny amount of data.^[11]

Halftones

Halftone images can be compressed using two methods. One of the methods is similar to the context-based arithmetic coding algorithm, which adaptively positions the template pixels in order to obtain correlations between the adjacent pixels. In the second method, descreening is performed on the halftone image so that the image is converted back to grayscale. The converted grayscale values are then used as indexes of fixed-sized tiny bitmap patterns contained in a halftone bitmap dictionary. This allows decoder to successfully render a halftone image by presenting indexed dictionary bitmap patterns neighboring with each other.^[10]

Entropy coding

All three region types including text, halftone, and generic regions may all use arithmetic coding or huffman coding. JBIG2 specifically uses the MQ coder, the same entropy encoder employed by JPEG 2000.

Patents

Patents for JBIG2 are owned by IBM and Mitsubishi. Free licenses should be available after a request. JBIG and JBIG2 patents are not the same.^[13]^[14]^[15]

Character substitution errors in scanned documents

Some implementations of JBIG2 using lossy compression can potentially alter the characters in documents that are scanned to PDF. Unlike some other algorithms where compression artifacts are obvious, such as blurring^[16] or mosquito noise, JBIG2's "pattern matching" matches up similar-looking symbols. If the matching is implemented poorly, especially in low-resolution scans where characters are less clearly defined, similar characters may get erroneously swapped. But as noted by computer scientist David Kriesel, who discovered such a problem as described below, "the error cause is not JBIG2 itself".^[17]

In 2013, various substitutions (including replacing "6" with "8") were reported to happen on many Xerox Workcentre photocopier and printer machines. Numbers printed on scanned (but not OCR-ed) documents had potentially been altered. This has been demonstrated on construction blueprints and some tables of numbers; the potential impact of such substitution errors in documents such as medical prescriptions was briefly mentioned.^[17]^[18]^[19] German computer scientist David Kriesel and Xerox were investigating this.^[20]^[21]

Xerox subsequently acknowledged that this was a long-standing software defect, and their initial statements in suggesting that only non-factory settings could introduce the substitution were incorrect. No attempt was made to recall or mandate updates to the affected devices – which was acknowledged to affect more than a dozen product families. However, in August 2013 a software patch was made available, that when installed, automatically disabled pattern matching.^[22] Documents previously scanned continue to potentially contain errors making their veracity difficult to substantiate. Following publicity about the potential for errors authorities in some countries made statements to prevent the use of JBIG2.^[23]

In Germany the Federal Office for Information Security has issued a technical guideline that says the JBIG2 encoding "MUST NOT be used" for "replacement scanning".^[24] In Switzerland the Coordination Office for the Permanent Archiving of Electronic Documents (Koordinationsstelle für die dauerhafte Archivierung elektronischer Unterlagen) has recommended against the use of JBIG2 when creating PDF documents.^[25]

Exploit

A vulnerability in the Xpdf implementation of JBIG2, re-used in Apple's iOS phone operating software, was used by the Pegasus spyware to implement a zero-click attack on iPhones by constructing an emulated computer architecture inside a JBIG2 stream. Apple fixed this "FORCEDENTRY" vulnerability in iOS 14.8 in September 2021.^[26]

References

External links

Multimedia compression and container formats

Video
compression

ISO, IEC, MPEG	DV MJPEG Motion JPEG 2000 MPEG-1 MPEG-2 Part 2 MPEG-4 Part 2 / ASP Part 10 / AVC Part 33 / IVC MPEG-H Part 2 / HEVC MPEG-I Part 3 / VVC MPEG-5 Part 1 / EVC Part 2 / LCEVC
ITU-T, VCEG	H.120 H.261 H.262 H.263 H.264 / AVC H.265 / HEVC H.266 / VVC
SMPTE	VC-1 VC-2 VC-3 VC-5 VC-6
TrueMotion	TrueMotion S VP3 VP6 VP7 VP8 VP9 AV1
Others	Apple Video AVS Bink Cinepak Daala DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 4444 QuickTime Animation Graphics RealVideo RTVideo SheerVideo Smacker Sorenson Video/Spark Theora Thor Ut WMV XEB YULS

Audio
compression

ISO, IEC, MPEG	MPEG-1 Layer II Multichannel MPEG-1 Layer I MPEG-1 Layer III (MP3) AAC HE-AAC AAC-LD MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio
ITU-T	G.711 A-law µ-law G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1
IETF	Opus iLBC Speex Vorbis
3GPP	AMR AMR-WB AMR-WB+ EVRC EVRC-B EVS GSM-HR GSM-FR GSM-EFR
ETSI	AC-3 AC-4 DTS
Bluetooth SIG	SBC LC3
Others	ACELP ALAC Asao ATRAC AVS CELT Codec 2 DRA FLAC iSAC Lyra MELP Monkey's Audio MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio RTAudio SD2 SHN SILK Siren SMV SVOPC TTA True Audio TwinVQ VMR-WB VSELP WavPack WMA MQA aptX aptX HD aptX Low Latency aptX Adaptive LDAC LHDC LLAC L2HC

Image
compression

IEC, ISO, IETF, W3C, ITU-T, JPEG	CCITT Group 4 GIF HEIC / HEIF HEVC JBIG JBIG2 JPEG JPEG 2000 JPEG-LS JPEG XL JPEG XR JPEG XS JPEG XT PNG TIFF TIFF/EP TIFF/IT
Others	APNG AV1 AVIF BPG DjVu EXR FLIF ICER MNG PGF QOI QTVR WBMP WebP

Containers

ISO, IEC	MPEG-ES MPEG-PES MPEG-PS MPEG-TS ISO/IEC base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport
ITU-T	H.222.0 T.802
IETF	RTP Ogg
SMPTE	GXF MXF
Others	3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink Smacker BMP DivX Media Format EVO Flash Video HEIF IFF M2TS Matroska WebM QuickTime File Format RatDVD RealMedia RIFF WAV MOD and TOD VOB, IFO and BUP

Collaborations

Methods

Entropy
LPC
- ACELP
- CELP
- LSP
- WLPC
Lossless
Lossy
LZ
- DEFLATE
- LZW
PCM
- A-law
- µ-law
- ADPCM
- DPCM
Transforms
- DCT
- FFT
- MDCT
- Wavelet
  - Daubechies
  - DWT

Lists

See Compression methods for techniques and Compression software for codecs

Graphics file formats

Graphics file formats
Raster	ANI ANIM APNG ART AVIF BMP BPG BSAVE CAL CIN CPC CPT DDS DPX ECW EXR FITS FLIC FLIF FPX GIF HDRi HEVC ICER ICNS ICO / CUR ICS ILBM JBIG JBIG2 JNG JPEG JPEG-LS JPEG 2000 JPEG XL JPEG XR JPEG XS JPEG XT JPEG-HDR KRA MNG MIFF NRRD ORA PAM PBM / PGM / PPM / PNM PCX PGF PICtor PNG PSD / PSB PSP QOI QTVR RAS RGBE Logluv TIFF SGI TGA TIFF TIFF/EP TIFF/IT UFO / UFP WBMP WebP XBM XCF XPM XWD
Raw	CIFF DNG
Vector	AI CDR CGM DXF EVA EMF EMF+ Gerber HVIF IGES PGML SVG VML WMF Xar
Compound	CDF DjVu EPS PDF PICT PS SWF XAML
Metadata	Exchangeable image file format (Exif) International Press Telecommunications Council § Photo metadata Extensible Metadata Platform (XMP) GIF § Metadata Steganography
Category Comparison