Packetized Elementary Stream (PES) is a specification in the MPEG-2 Part 1 (Systems) (ISO/IEC 13818-1) and ITU-T H.222.0[1][2] that defines carrying of elementary streams (usually the output of an audio or video encoder) in packets within MPEG program streams and MPEG transport streams.[3] The elementary stream is packetized by encapsulating sequential data bytes from the elementary stream inside PES packet headers.

A typical method of transmitting elementary stream data from a video or audio encoder is to first create PES packets from the elementary stream data and then to encapsulate these PES packets inside Transport Stream (TS) packets or Program Stream (PS) packets. The TS packets can then be multiplexed and transmitted using broadcasting techniques, such as those used in an ATSC and DVB.

Transport Streams and Program Streams are each logically constructed from PES packets. PES packets shall be used to convert between Transport Streams and Program Streams. In some cases the PES packets need not be modified when performing such conversions. PES packets may be much larger than the size of a Transport Stream packet.[3]

PES packet header

Name Size Description
Packet start code prefix 3 bytes 0x000001
Stream id 1 byte Examples: Audio streams (0xC0-0xDF), Video streams (0xE0-0xEF) [4][5]
Note: The above 4 bytes is called the 32 bit start code.
PES Packet length 2 bytes Specifies the number of bytes remaining in the packet after this field. Can be zero. If the PES packet length is set to zero, the PES packet can be of any length. A value of zero for the PES packet length can be used only when the PES packet payload is a video elementary stream.[6]
Optional PES header variable length (length >= 3) not present in case of Padding stream & Private stream 2 (navigation data)
Data See elementary stream. In the case of private streams the first byte of the payload is the sub-stream number.

Optional PES header

Name Number of Bits Description
Marker bits 2 10 binary or 0x2 hex
Scrambling control 2 00 implies not scrambled
Priority 1
Data alignment indicator 1 1 indicates that the PES packet header is immediately followed by the video start code or audio syncword
Copyright 1 1 implies copyrighted
Original or Copy 1 1 implies original
PTS DTS indicator 2 11 = both present, 01 is forbidden, 10 = only PTS, 00 = no PTS or DTS
ESCR flag 1
ES rate flag 1
DSM trick mode flag 1
Additional copy info flag 1
CRC flag 1
extension flag 1
PES header length 8 gives the length of the remainder of the PES header in bytes
Optional fields variable length presence is determined by flag bits above
Stuffing Bytes variable length 0xff

While above flags indicate that values are appended into variable length optional fields, they are not just simply written out. For example, PTS (and DTS) is expanded from 33 bits to 5 bytes (40 bits). If only PTS is present, this is done by catenating 0010b, most significant 3 bits from PTS, 1, following next 15 bits, 1, rest 15 bits and 1. If both PTS and DTS are present, first 4 bits for PTS are 0011 and first 4 bits for DTS are 0001. Other appended bytes have similar but different encoding.


  1. ^ ITU-T (November 2014). "H.222.0 Summary". Retrieved 2015-11-17.
  2. ^ ITU-T. "H.222.0 : Information technology - Generic coding of moving pictures and associated audio information: Systems". Retrieved 2010-06-03.
  3. ^ a b "ISO/IEC 13818-1 — Information technology — Generic coding of moving pictures and associated audio information: Systems" (PDF) (second ed.). 2000-12-01. Retrieved 2009-07-25.
  4. ^ "ETSI TS 101 154 - V1.9.1 - Digital Video Broadcasting (DVB); Specification for the use of Video and Audio Coding in Broadcasting Applications based on the MPEG-2 Transport Stream" (PDF). ETSI. September 2009.
  5. ^ EP 1827030, "Method and apparatus for changing codec to reproduce video and/or audio data streams encoded by different codecs within a channel" 
  6. ^ "A guide to digital terrestrial television broadcasting in the VHF/UHF bands". 15 January 1996. sec. 4.4.