Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 (MPEG-4 Audio) standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique.^[1] The total algorithmic delay for the encoder and decoder is 36 ms.^[2]

It was published as subpart 2 of ISO/IEC 14496-3:1999 (MPEG-4 Audio) in 1999.^[3] An extended version of HVXC was published in MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000).^[4]^[5]

MPEG-4 Natural Speech Coding Tool Set uses two algorithms: HVXC and CELP (Code Excited Linear Prediction). HVXC is used at a low bit rate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP.^[6]

Technology

Linear Predictive Coding

HVXC uses Linear predictive coding (LPC) with block-wise adaptation every 20ms.^[2] The LPC parameters are transformed into Line spectral pair (LSP) coefficients, which are jointly quantized.^[2] The LPC residual signal is classified as either voiced or unvoiced. In the case of voiced speech, the residual is coded in a parametric representation (operating as a vocoder), while in the case of unvoiced speech, the residual waveform is quantized (thus operating as a hybrid speech codec).

Voiced (Harmonic) Residual Coding

In voiced segments, the residual signal is represented by two parameters: the pitch period and the spectral envelope.^[2] The pitch period is estimated from the peak values of the autocorrelation of the residual signal.^[2] In this process, the residual signal is compared against shifted copies of itself, and the shift which yields the greatest similarity by the measure of linear dependence is identified as the pitch period. The spectral envelope is represented by a set of amplitude values, one per harmonic.^[2] To extract these values, the LPC residual signal is transformed into the DFT-domain.^[2] The DFT-spectrum is segmented into bands, one band per harmonic. The frequency band for the m-th harmonic consists of the DFT-coefficients from (m-1/2)ω₀ to (m+1/2)ω₀, ω₀ being the pitch frequency.^[2] The amplitude value for the m-th harmonic is chosen to optimally represent these DFT-coefficients.^[2] Phase information is discarded in this process. The spectral envelope is then coded using variable-dimension weighted vector quantization. This process is also referred to as Harmonic VQ.

To make a speech with a mixture of voiced and unvoiced excitation sound more natural and smooth, three different modes of voiced speech (Mixed Voiced-1, Mixed Voiced-2, Full Voiced) are differentiated.^[2] The degree of voicing is determined by the value of the normalized autocorrelation function at a shift of one pitch period. Depending on the chosen mode, different amounts of band-pass Gaussian noise are added to the synthesized harmonic signal by the decoder.

Voiceless (VXC) Residual Coding

Unvoiced segments are encoded according to the CELP scheme, which is also referred to as vector excitation coding (VXC).^[2] The CELP coding in HVXQ is performed using only a stochastic codebook. In other CELP codecs, a dynamic codebook is used additionally to perform long-term prediction of voiced segments. However, since HVXC does not use CELP for voiced segments, the dynamic codebook is omitted from the design.

References

^ ISO/IEC (2009-09-01), ISO/IEC 14496-3:2009 - Information technology -- Coding of audio-visual objects -- Part 3: Audio (PDF), IEC, retrieved 2009-10-07
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Masayuki Nishiguchi (2006-04-17), Harmonic vector excitation coding of speech (PDF), Acoustical Science and Technology, retrieved 2009-10-09
^ ISO (1999). "ISO/IEC 14496-3:1999 - Information technology -- Coding of audio-visual objects -- Part 3: Audio". ISO. Retrieved 2009-10-09.
^ ISO (2000). "ISO/IEC 14496-3:1999/Amd 1:2000 - Audio extensions". ISO. Retrieved 2009-10-07.
^ ISO/IEC JTC 1/SC 29/WG 11 (July 1999), ISO/IEC 14496-3:/Amd.1 - Final Committee Draft - MPEG-4 Audio Version 2 (PDF), archived from the original (PDF) on 2012-08-01, retrieved 2009-10-07((citation)): CS1 maint: numeric names: authors list (link)
^ Karlheinz Brandenburg; Oliver Kunz; Akihiko Sugiyama. "MPEG-4 Natural Audio Coding - Natural Speech Coding Tools" (PDF). Retrieved 2013-03-25.

Multimedia compression and container formats

Video
compression

ISO, IEC, MPEG	DV MJPEG Motion JPEG 2000 MPEG-1 MPEG-2 Part 2 MPEG-4 Part 2 / ASP Part 10 / AVC Part 33 / IVC MPEG-H Part 2 / HEVC MPEG-I Part 3 / VVC MPEG-5 Part 1 / EVC Part 2 / LCEVC
ITU-T, VCEG	H.120 H.261 H.262 H.263 H.264 / AVC H.265 / HEVC H.266 / VVC
SMPTE	VC-1 VC-2 VC-3 VC-5 VC-6
TrueMotion	TrueMotion S VP3 VP6 VP7 VP8 VP9 AV1
Others	Apple Video AVS Bink Cinepak Daala DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 4444 QuickTime Animation Graphics RealVideo RTVideo SheerVideo Smacker Sorenson Video/Spark Theora Thor Ut WMV XEB YULS

Audio
compression

ISO, IEC, MPEG	MPEG-1 Layer II Multichannel MPEG-1 Layer I MPEG-1 Layer III (MP3) AAC HE-AAC AAC-LD MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio
ITU-T	G.711 A-law µ-law G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1
IETF	Opus iLBC Speex Vorbis
3GPP	AMR AMR-WB AMR-WB+ EVRC EVRC-B EVS GSM-HR GSM-FR GSM-EFR
ETSI	AC-3 AC-4 DTS
Bluetooth SIG	SBC LC3
Others	ACELP ALAC Asao ATRAC AVS CELT Codec 2 DRA FLAC iSAC Lyra MELP Monkey's Audio MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio RTAudio SD2 SHN SILK Siren SMV SVOPC TTA True Audio TwinVQ VMR-WB VSELP WavPack WMA MQA aptX aptX HD aptX Low Latency aptX Adaptive LDAC LHDC LLAC L2HC

Image
compression

IEC, ISO, IETF, W3C, ITU-T, JPEG	CCITT Group 4 GIF HEIC / HEIF HEVC JBIG JBIG2 JPEG JPEG 2000 JPEG-LS JPEG XL JPEG XR JPEG XS JPEG XT PNG TIFF TIFF/EP TIFF/IT
Others	APNG AV1 AVIF BPG DjVu EXR FLIF ICER MNG PGF QOI QTVR WBMP WebP

Containers

ISO, IEC	MPEG-ES MPEG-PES MPEG-PS MPEG-TS ISO/IEC base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport
ITU-T	H.222.0 T.802
IETF	RTP Ogg
SMPTE	GXF MXF
Others	3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink Smacker BMP DivX Media Format EVO Flash Video HEIF IFF M2TS Matroska WebM QuickTime File Format RatDVD RealMedia RIFF WAV MOD and TOD VOB, IFO and BUP

Collaborations

Methods

Entropy
LPC
- ACELP
- CELP
- LSP
- WLPC
Lossless
Lossy
LZ
- DEFLATE
- LZW
PCM
- A-law
- µ-law
- ADPCM
- DPCM
Transforms
- DCT
- FFT
- MDCT
- Wavelet
  - Daubechies
  - DWT

Lists

See Compression methods for techniques and Compression software for codecs

Technology

Linear Predictive Coding

Voiced (Harmonic) Residual Coding

Voiceless (VXC) Residual Coding

See also

References