MPEG-4 Part 3 or MPEG-4 Audio (formally ISO/IEC 14496-3) is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group.^[1] It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.^[2]

The MPEG-4 Part 3 consists of a variety of audio coding technologies – from lossy speech coding (HVXC, CELP), general audio coding (AAC, TwinVQ, BSAC), lossless audio compression (MPEG-4 SLS, Audio Lossless Coding, MPEG-4 DST), a Text-To-Speech Interface (TTSI), Structured Audio (using SAOL, SASL, MIDI) and many additional audio synthesis and coding techniques.^[3]^[4]^[5]^[6]^[7]^[8]^[9]^[10]^[11]

MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback. MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.^[7]

Versions

MPEG-4 Audio versions and editions^[12]
Edition	Release date	Latest amendment	Standard	Description
First edition	1999	2001	ISO/IEC 14496-3:1999^[2]	also known as "MPEG-4 Audio Version 1"
		2000	ISO/IEC 14496-3:1999/Amd 1:2000^[13]	also known as "MPEG-4 Audio Version 2", an Amendment to first edition^[7]^[8]
Second edition	2001	2005	ISO/IEC 14496-3:2001^[14]
Third edition	2005	2008	ISO/IEC 14496-3:2005^[15]
Fourth edition	2009	2015 and under development^[12]	ISO/IEC 14496-3:2009^[1]^[16]
Fifth edition	2019		ISO/IEC 14496-3:2019^[17]	Current version

Subparts

MPEG-4 Part 3 contains following subparts:^[16]

Subpart 1: Main (list of Audio Object Types, Profiles, Levels, interface to ISO/IEC 14496-1, MPEG-4 Audio transport stream, etc.)
Subpart 2: Speech coding – HVXC (Harmonic Vector eXcitation Coding)
Subpart 3: Speech coding – CELP (Code Excited Linear Prediction)
Subpart 4: General Audio Coding (GA) (Time/Frequency Coding) – AAC, TwinVQ, BSAC
Subpart 5: Structured Audio (SA)
Subpart 6: Text to Speech Interface (TTSI)
Subpart 7: Parametric Audio Coding – HILN (Harmonic and Individual Line plus Noise)
Subpart 8: Technical description of parametric coding for high quality audio (SSC, Parametric Stereo)
Subpart 9: MPEG-1/MPEG-2 Audio in MPEG-4
Subpart 10: Technical description of lossless coding of oversampled audio (MPEG-4 DST – Direct Stream Transfer)
Subpart 11: Audio Lossless Coding (ALS)
Subpart 12: Scalable Lossless Coding (SLS)

MPEG-4 Audio Object Types

MPEG-4 Audio includes a system for handling a diverse group of audio formats in a uniform manner. Each format is assigned a unique Audio Object Type to represent it.^[18]^[19] Object Type is used to distinguish between different coding methods. It directly determines the MPEG-4 tool subset required to decode a specific object. The MPEG-4 profiles are based on the object types and each profile supports a different list of object types.^[19]

MPEG-4 Audio Object Types^[7]^[9]^[18]^[20]^[21]
Object Type ID	Audio Object Type	First public release date	Description
1	AAC Main	1999	contains AAC LC
2	AAC LC (Low Complexity)	1999	Used in the "AAC Profile". MPEG-4 AAC LC Audio Object Type is based on the MPEG-2 Part 7 Low Complexity profile (LC) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4).^[4]^[22]
3	AAC SSR (Scalable Sample Rate)	1999	MPEG-4 AAC SSR Audio Object Type is based on the MPEG-2 Part 7 Scalable Sampling Rate profile (SSR) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4).^[4]^[22]
4	AAC LTP (Long Term Prediction)	1999	contains AAC LC
5	SBR (Spectral Band Replication)	2003^[23]	used with AAC LC in the "High Efficiency AAC Profile" (HE-AAC v1)
6	AAC Scalable	1999
7	TwinVQ	1999	audio coding at very low bitrates
8	CELP (Code Excited Linear Prediction)	1999	speech coding
9	HVXC (Harmonic Vector eXcitation Coding)	1999	speech coding
10	(Reserved)
11	(Reserved)
12	TTSI (Text-To-Speech Interface)	1999
13	Main synthesis	1999	contains 'wavetable' sample-based synthesis^[24] and Algorithmic Synthesis and Audio Effects
14	'wavetable' sample-based synthesis	1999	based on SoundFont and DownLoadable Sounds,^[24] contains General MIDI
15	General MIDI	1999
16	Algorithmic Synthesis and Audio Effects	1999
17	ER AAC LC	2000	Error Resilient
18	(Reserved )
19	ER AAC LTP	2000	Error Resilient
20	ER AAC Scalable	2000	Error Resilient
21	ER TwinVQ	2000	Error Resilient
22	ER BSAC (Bit-Sliced Arithmetic Coding)	2000	It is also known as "Fine Granule Audio" or fine grain scalability tool. It is used in combination with the AAC coding tools and replaces the noiseless coding and the bitstream formatting of MPEG-4 Version 1 GA coder. Error Resilient
23	ER AAC LD (Low Delay)	2000	Error Resilient, used with CELP, ER CELP, HVXC, ER HVXC and TTSI in the "Low Delay Profile", (commonly used for real-time conversation applications)
24	ER CELP	2000	Error Resilient
25	ER HVXC	2000	Error Resilient
26	ER HILN (Harmonic and Individual Lines plus Noise)	2000	Error Resilient
27	ER Parametric	2000	Error Resilient
28	SSC (SinuSoidal Coding)	2004^[25]^[26]
29	PS (Parametric Stereo)	2004^[27] and 2006^[28]^[29]	used with AAC LC and SBR in the "HE-AAC v2 Profile". PS coding tool was defined in 2004 and Object Type defined in 2006.
30	MPEG Surround	2007^[30]	also known as MPEG Spatial Audio Coding (SAC), it is a type of spatial audio coding^[31]^[32] (MPEG Surround was also defined in ISO/IEC 23003-1 in 2007^[33])
31	(ESCAPE)
32	MPEG-1/2 Layer-1	2005^[34]
33	MPEG-1/2 Layer-2	2005^[34]
34	MPEG-1/2 Layer-3	2005^[34]	also known as "MP3onMP4"
35	DST (Direct Stream Transfer)	2005^[35]	lossless audio coding, used on Super Audio CD
36	ALS (Audio Lossless Coding)	2006^[29]	lossless audio coding
37	SLS (Scalable Lossless Coding)	2006^[36]	two-layer audio coding with lossless layer and lossy General Audio core/layer (e.g. AAC)
38	SLS non-core	2006	lossless audio coding without lossy General Audio core/layer (e.g. AAC)
39	ER AAC ELD (Enhanced Low Delay)	2008^[37]	Error Resilient
40	SMR (Symbolic Music Representation) Simple	2008	note: Symbolic Music Representation is also the MPEG-4 Part 23 standard (ISO/IEC 14496-23:2008)^[38]^[39]
41	SMR Main	2008
42	USAC (Unified Speech and Audio Coding)	2012	Unified Speech and audio Coding is defined in MPEG-D Part 3 (ISO/IEC 23003-3:2012)^[40]
43	SAOC (Spatial Audio Object Coding)	2010^[41]^[42]	note: Spatial Audio Object Coding is also the MPEG-D Part 2 standard (ISO/IEC 23003-2:2010)^[43]
44	LD MPEG Surround	2010^[44]	This object type conveys Low Delay MPEG Surround Coding side information (that was defined in MPEG-D Part 2 – ISO/IEC 23003-2^[43]) in the MPEG-4 Audio framework.
45	SAOC-DE	2013	Spatial Audio Object Coding Dialogue Enhancement
46	Audio Sync	2015	The audio synchronization tool provides capability of synchronizing multiple contents in multiple devices.

Audio Profiles

The MPEG-4 Audio standard defines several profiles. These profiles are based on the object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time.

MPEG-4 Audio Profiles^[19]^[21]
Audio Profile	Audio Object Types	First public release date
AAC Profile	AAC LC	2003
High Efficiency AAC Profile	AAC LC, SBR	2003
HE-AAC v2 Profile	AAC LC, SBR, PS	2006
Main Audio Profile	AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, Main synthesis	1999
Scalable Audio Profile	AAC LC, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI	1999
Speech Audio Profile	CELP, HVXC, TTSI	1999
Synthetic Audio Profile	TTSI, Main synthesis	1999
High Quality Audio Profile	AAC LC, AAC LTP, AAC Scalable, CELP, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER CELP	2000
Low Delay Audio Profile	CELP, HVXC, TTSI, ER AAC LD, ER CELP, ER HVXC	2000
Natural Audio Profile	AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD, ER CELP, ER HVXC, ER HILN, ER Parametric	2000
Mobile Audio Internetworking Profile	ER AAC LC, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD	2000
HD-AAC Profile	AAC LC, SLS^[45]	2009^[46]
ALS Simple Profile	ALS	2010^[42]^[47]

Audio storage and transport

Multiplex, storage and transmission formats for MPEG-4 Audio^[16]
	Standard	Description
Multiplex	ISO/IEC 14496-1	MPEG-4 Multiplex scheme (M4Mux)^[48]
Multiplex	ISO/IEC 14496-3	Low Overhead Audio Transport Multiplex (LATM)
Storage	ISO/IEC 14496-3 (informative)	Audio Data Interchange Format (ADIF) – only for AAC
Storage	ISO/IEC 14496-12	MPEG-4 file format (MP4) / ISO base media file format
Transmission	ISO/IEC 14496-3 (informative)	Audio Data Transport Stream (ADTS) – only for AAC
Transmission	ISO/IEC 14496-3	Low Overhead Audio Stream (LOAS), based on LATM

There is no standard for transport of elementary streams over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution.

The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the Delivery Multimedia Integration Framework (DMIF) in ISO/IEC 14496-6.^[16] A wide variety of delivery mechanisms exist below this interface, e.g., MPEG transport stream, Real-time Transport Protocol (RTP), etc.

Transport in Real-time Transport Protocol is defined in RFC 3016 (RTP Payload Format for MPEG-4 Audio/Visual Streams), RFC 3640 (RTP Payload Format for Transport of MPEG-4 Elementary Streams), RFC 4281 (The Codecs Parameter for "Bucket" Media Types) and RFC 4337 (MIME Type Registration for MPEG-4).

LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.

Bifurcation in the AAC technical standard

Main article: Advanced Audio Coding

The Advanced Audio Coding in MPEG-4 Part 3 (MPEG-4 Audio) Subpart 4 was enhanced relative to the previous standard MPEG-2 Part 7 (Advanced Audio Coding), in order to provide better sound quality for a given encoding bitrate.

It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard.

The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles:^[49]^[50] Low Complexity profile (LC), Main profile and Scalable Sampling Rate profile (SSR).

The MPEG-4 Part 3 Subpart 4 (General Audio Coding) combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution (PNS) and defined them as Audio Object Types (AAC LC, AAC Main, AAC SSR).^[4]

HE-AAC

Main article: HE-AAC

High-Efficiency Advanced Audio Coding is an extension of AAC LC using spectral band replication (SBR), and Parametric Stereo (PS). It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio.

AAC-SSR

AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards.^{[citation needed]} It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding (AAC) in 1997.^[49]^[50] The audio signal is first split into 4 bands using a 4 band polyphase quadrature filter bank. Then these 4 bands are further split using MDCTs with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal.

The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands coding efficiencies around (1,2,3) * fs/8 is worse than normal MPEG-4 AAC LC.^{[citation needed]}

MPEG-4 AAC-SSR is very similar to ATRAC and ATRAC-3.

Why AAC-SSR was introduced

The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing the data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate.

Example:

4 subbands: bitrate = 128 kbit/s, sample rate = 48 kHz, f_lowpass = 20 kHz
3 subbands: bitrate ~ 120 kbit/s, sample rate = 48 kHz, f_lowpass = 18 kHz
2 subbands: bitrate ~ 100 kbit/s, sample rate = 24 kHz, f_lowpass = 12 kHz
1 subband: bitrate ~ 65 kbit/s, sample rate = 12 kHz, f_lowpass = 6 kHz

Note: although possible, the resulting quality is much worse than typical for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is achieved by using intensity stereo and reduced NMRs. This degrades audible quality less than transmitting 6 kHz bandwidth with perfect quality.

BSAC

Bit Sliced Arithmetic Coding is an MPEG-4 standard (ISO/IEC 14496-3 subpart 4) for scalable audio coding. BSAC uses an alternative noiseless coding to AAC, with the rest of the processing being identical to AAC. This support for scalability allows for nearly transparent sound quality at 64 kbit/s and graceful degradation at lower bit rates. BSAC coding is best performed in the range of 40 kbit/s to 64 kbit/s, though it operates in the range of 16 kbit/s to 64 kbit/s. The AAC-BSAC codec is used in Digital Multimedia Broadcasting (DMB) applications.

Licensing

In 2002, the MPEG-4 Audio Licensing Committee selected the Via Licensing Corporation as the Licensing Administrator for the MPEG-4 Audio patent pool.^[3]^[51]^[52]

References

External links

Multimedia compression and container formats

Video
compression

ISO, IEC, MPEG	DV MJPEG Motion JPEG 2000 MPEG-1 MPEG-2 Part 2 MPEG-4 Part 2 / ASP Part 10 / AVC Part 33 / IVC MPEG-H Part 2 / HEVC MPEG-I Part 3 / VVC MPEG-5 Part 1 / EVC Part 2 / LCEVC
ITU-T, VCEG	H.120 H.261 H.262 H.263 H.264 / AVC H.265 / HEVC H.266 / VVC
SMPTE	VC-1 VC-2 VC-3 VC-5 VC-6
TrueMotion	TrueMotion S VP3 VP6 VP7 VP8 VP9 AV1
Others	Apple Video AVS Bink Cinepak Daala DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 4444 QuickTime Animation Graphics RealVideo RTVideo SheerVideo Smacker Sorenson Video/Spark Theora Thor Ut WMV XEB YULS

Audio
compression

ISO, IEC, MPEG	MPEG-1 Layer II Multichannel MPEG-1 Layer I MPEG-1 Layer III (MP3) AAC HE-AAC AAC-LD MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio
ITU-T	G.711 A-law µ-law G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1
IETF	Opus iLBC Speex Vorbis
3GPP	AMR AMR-WB AMR-WB+ EVRC EVRC-B EVS GSM-HR GSM-FR GSM-EFR
ETSI	AC-3 AC-4 DTS
Bluetooth SIG	SBC LC3
Others	ACELP ALAC Asao ATRAC AVS CELT Codec 2 DRA FLAC iSAC Lyra MELP Monkey's Audio MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio RTAudio SD2 SHN SILK Siren SMV SVOPC TTA True Audio TwinVQ VMR-WB VSELP WavPack WMA MQA aptX aptX HD aptX Low Latency aptX Adaptive LDAC LHDC LLAC L2HC

Image
compression

IEC, ISO, IETF, W3C, ITU-T, JPEG	CCITT Group 4 GIF HEIC / HEIF HEVC JBIG JBIG2 JPEG JPEG 2000 JPEG-LS JPEG XL JPEG XR JPEG XS JPEG XT PNG TIFF TIFF/EP TIFF/IT
Others	APNG AV1 AVIF BPG DjVu EXR FLIF ICER MNG PGF QOI QTVR WBMP WebP

Containers

ISO, IEC	MPEG-ES MPEG-PES MPEG-PS MPEG-TS ISO/IEC base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport
ITU-T	H.222.0 T.802
IETF	RTP Ogg
SMPTE	GXF MXF
Others	3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink Smacker BMP DivX Media Format EVO Flash Video HEIF IFF M2TS Matroska WebM QuickTime File Format RatDVD RealMedia RIFF WAV MOD and TOD VOB, IFO and BUP

Collaborations

Methods

Entropy
LPC
- ACELP
- CELP
- LSP
- WLPC
Lossless
Lossy
LZ
- DEFLATE
- LZW
PCM
- A-law
- µ-law
- ADPCM
- DPCM
Transforms
- DCT
- FFT
- MDCT
- Wavelet
  - Daubechies
  - DWT

Lists

See Compression methods for techniques and Compression software for codecs

MPEG (Moving Picture Experts Group)

MPEG (Moving Picture Experts Group)
MPEG-1 2 3 4 7 21 A B C D E G V M U H I 5
MPEG-1 Parts	Part 1: Systems Program stream Part 2: Video based on H.261 Part 3: Audio Layer I Layer II Layer III
MPEG-2 Parts	Part 1: Systems (H.222.0) Transport stream Program stream Part 2: Video (H.262) Part 3: Audio Layer I Layer II Layer III MPEG Multichannel Part 6: DSM CC Part 7: Advanced Audio Coding
MPEG-4 Parts	Part 2: Video based on H.263 Part 3: Audio Part 6: DMIF Part 10: Advanced Video Coding (H.264) Part 11: Scene description Part 12: ISO base media file format Part 14: MP4 file format Part 17: Streaming text format Part 20: LASeR Part 22: Open Font Format Part 33: Internet Video Coding
MPEG-7 Parts	Part 2: Description definition language
MPEG-21 Parts	Parts 2, 3 and 9: Digital Item Part 5: Rights Expression Language
MPEG-D Parts	Part 1: MPEG Surround Part 3: Unified Speech and Audio Coding
MPEG-G Parts	Part 1: Transport and Storage of Genomic Information Part 2: Coding of Genomic Information Part 3: APIs Part 4: Reference Software Part 5: Conformance
MPEG-H Parts	Part 1: MPEG media transport Part 2: High Efficiency Video Coding (H.265) Part 3: MPEG-H 3D Audio Part 12: High Efficiency Image File Format
MPEG-I Parts	Part 3: Versatile Video Coding (H.266)
MPEG-5 Parts	Part 1: Essential Video Coding Part 2: Low Complexity Enhancement Video Coding
Other	MPEG-DASH