In statistical signal processing, the goal of spectral density estimation (SDE) or simply spectral estimation is to estimate the spectral density (also known as the power spectral density) of a signal from a sequence of time samples of the signal.^[1] Intuitively speaking, the spectral density characterizes the frequency content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities.

Some SDE techniques assume that a signal is composed of a limited (usually small) number of generating frequencies plus noise and seek to find the location and intensity of the generated frequencies. Others make no assumption on the number of components and seek to estimate the whole generating spectrum.

Overview

This article may need to be cleaned up. It has been merged from Frequency domain.

Spectrum analysis, also referred to as frequency domain analysis or spectral density estimation, is the technical process of decomposing a complex signal into simpler parts. As described above, many physical processes are best described as a sum of many individual frequency components. Any process that quantifies the various amounts (e.g. amplitudes, powers, intensities) versus frequency (or phase) can be called spectrum analysis.

Spectrum analysis can be performed on the entire signal. Alternatively, a signal can be broken into short segments (sometimes called frames), and spectrum analysis may be applied to these individual segments. Periodic functions (such as $\sin(t)$ ) are particularly well-suited for this sub-division. General mathematical techniques for analyzing non-periodic functions fall into the category of Fourier analysis.

The Fourier transform of a function produces a frequency spectrum which contains all of the information about the original signal, but in a different form. This means that the original function can be completely reconstructed (synthesized) by an inverse Fourier transform. For perfect reconstruction, the spectrum analyzer must preserve both the amplitude and phase of each frequency component. These two pieces of information can be represented as a 2-dimensional vector, as a complex number, or as magnitude (amplitude) and phase in polar coordinates (i.e., as a phasor). A common technique in signal processing is to consider the squared amplitude, or power; in this case the resulting plot is referred to as a power spectrum.

Because of reversibility, the Fourier transform is called a representation of the function, in terms of frequency instead of time; thus, it is a frequency domain representation. Linear operations that could be performed in the time domain have counterparts that can often be performed more easily in the frequency domain. Frequency analysis also simplifies the understanding and interpretation of the effects of various time-domain operations, both linear and non-linear. For instance, only non-linear or time-variant operations can create new frequencies in the frequency spectrum.

In practice, nearly all software and electronic devices that generate frequency spectra utilize a discrete Fourier transform (DFT), which operates on samples of the signal, and which provides a mathematical approximation to the full integral solution. The DFT is almost invariably implemented by an efficient algorithm called fast Fourier transform (FFT). The array of squared-magnitude components of a DFT is a type of power spectrum called periodogram, which is widely used for examining the frequency characteristics of noise-free functions such as filter impulse responses and window functions. But the periodogram does not provide processing-gain when applied to noiselike signals or even sinusoids at low signal-to-noise ratios. In other words, the variance of its spectral estimate at a given frequency does not decrease as the number of samples used in the computation increases. This can be mitigated by averaging over time (Welch's method^[2]) or over frequency (smoothing). Welch's method is widely used for spectral density estimation (SDE). However, periodogram-based techniques introduce small biases that are unacceptable in some applications. So other alternatives are presented in the next section.

Techniques

Many other techniques for spectral estimation have been developed to mitigate the disadvantages of the basic periodogram. These techniques can generally be divided into non-parametric, parametric, and more recently semi-parametric (also called sparse) methods.^[3] The non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure. Some of the most common estimators in use for basic applications (e.g. Welch's method) are non-parametric estimators closely related to the periodogram. By contrast, the parametric approaches assume that the underlying stationary stochastic process has a certain structure that can be described using a small number of parameters (for example, using an auto-regressive or moving-average model). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. When using the semi-parametric methods, the underlying process is modeled using a non-parametric framework, with the additional assumption that the number of non-zero components of the model is small (i.e., the model is sparse). Similar approaches may also be used for missing data recovery^[4] as well as signal reconstruction.

Following is a partial list of spectral density estimation techniques:

Non-parametric methods for which the signal samples can be unevenly spaced in time (records can be incomplete)
- Least-squares spectral analysis, based on least squares fitting to known frequencies
- Lomb–Scargle periodogram, an approximation of the Least-squares spectral analysis
- Non-uniform discrete Fourier transform

Non-parametric methods for which the signal samples must be evenly spaced in time (records must be complete):
- Periodogram, the modulus squared of the discrete Fourier transform
- Bartlett's method is the average of the periodograms taken of multiple segments of the signal to reduce variance of the spectral density estimate
- Welch's method a windowed version of Bartlett's method that uses overlapping segments
- Multitaper is a periodogram-based method that uses multiple tapers, or windows, to form independent estimates of the spectral density to reduce variance of the spectral density estimate
- Singular spectrum analysis is a nonparametric method that uses a singular value decomposition of the covariance matrix to estimate the spectral density
- Short-time Fourier transform
- Critical filter is a nonparametric method based on information field theory that can deal with noise, incomplete data, and instrumental response functions
Parametric techniques (an incomplete list):
- Autoregressive model (AR) estimation, which assumes that the nth sample is correlated with the previous p samples.
- Moving-average model (MA) estimation, which assumes that the nth sample is correlated with noise terms in the previous p samples.
- Autoregressive moving-average (ARMA) estimation, which generalizes the AR and MA models.
- MUltiple SIgnal Classification (MUSIC) is a popular superresolution method.
- Estimation of signal parameters via rotational invariance techniques (ESPRIT) is another superresolution method.
- Maximum entropy spectral estimation is an all-poles method useful for SDE when singular spectral features, such as sharp peaks, are expected.
Semi-parametric techniques (an incomplete list):
- SParse Iterative Covariance-based Estimation (SPICE) estimation,^[3] and the more generalized $(r,q)$ -SPICE.^[5]
- Iterative Adaptive Approach (IAA) estimation.^[6]
- Lasso, similar to least-squares spectral analysis but with a sparsity enforcing penalty.^[7]

Parametric estimation

In parametric spectral estimation, one assumes that the signal is modeled by a stationary process which has a spectral density function (SDF) $S(f;a_{1},\ldots ,a_{p})$ that is a function of the frequency $f$ and $p$ parameters ${\displaystyle a_{1},\ldots ,a_{p))$ .^[8] The estimation problem then becomes one of estimating these parameters.

The most common form of parametric SDF estimate uses as a model an autoregressive model ${\text{AR))(p)$ of order $p$ .^[8]^: 392 A signal sequence ${\displaystyle \{Y_{t}\))$ obeying a zero mean ${\text{AR))(p)$ process satisfies the equation

Y_{t}=\phi _{1}Y_{t-1}+\phi _{2}Y_{t-2}+\cdots +\phi _{p}Y_{t-p}+\epsilon _{t},

where the ${\displaystyle \phi _{1},\ldots ,\phi _{p))$ are fixed coefficients and ${\displaystyle \epsilon _{t))$ is a white noise process with zero mean and innovation variance ${\displaystyle \sigma _{p}^{2))$ . The SDF for this process is

S(f;\phi _{1},\ldots ,\phi _{p},\sigma _{p}^{2})={\frac {\sigma _{p}^{2}\Delta t}{\left|1-\sum _{k=1}^{p}\phi _{k}e^{-2i\pi fk\Delta t}\right|^{2))}\qquad |f|<f_{N},

with $\Delta t$ the sampling time interval and ${\displaystyle f_{N))$ the Nyquist frequency.

There are a number of approaches to estimating the parameters ${\displaystyle \phi _{1},\ldots ,\phi _{p},\sigma _{p}^{2))$ of the ${\text{AR))(p)$ process and thus the spectral density:^[8]^: 452-453

The Yule–Walker estimators are found by recursively solving the Yule–Walker equations for an ${\text{AR))(p)$ process
The Burg estimators are found by treating the Yule–Walker equations as a form of ordinary least squares problem. The Burg estimators are generally considered superior to the Yule–Walker estimators.^[8]^: 452 Burg associated these with maximum entropy spectral estimation.^[9]
The forward-backward least-squares estimators treat the ${\text{AR))(p)$ process as a regression problem and solves that problem using forward-backward method. They are competitive with the Burg estimators.
The maximum likelihood estimators estimate the parameters using a maximum likelihood approach. This involves a nonlinear optimization and is more complex than the first three.

Alternative parametric methods include fitting to a moving-average model (MA) and to a full autoregressive moving-average model (ARMA).

Frequency estimation

Frequency estimation is the process of estimating the frequency, amplitude, and phase-shift of a signal in the presence of noise given assumptions about the number of the components.^[10] This contrasts with the general methods above, which do not make prior assumptions about the components.

Single tone

Multiple tones

A typical model for a signal $x(n)$ consists of a sum of $p$ complex exponentials in the presence of white noise, $w(n)$

x(n)=\sum _{i=1}^{p}A_{i}e^{jn\omega _{i))+w(n)

.

The power spectral density of $x(n)$ is composed of $p$ impulse functions in addition to the spectral density function due to noise.

The most common methods for frequency estimation involve identifying the noise subspace to extract these components. These methods are based on eigen decomposition of the autocorrelation matrix into a signal subspace and a noise subspace. After these subspaces are identified, a frequency estimation function is used to find the component frequencies from the noise subspace. The most popular methods of noise subspace based frequency estimation are Pisarenko's method, the multiple signal classification (MUSIC) method, the eigenvector method, and the minimum norm method.

Pisarenko's method: ${\displaystyle {\hat {P))_{\text{PHD))\left(e^{j\omega }\right)={\frac {1}{\left|\mathbf {e} ^{H}\mathbf {v} _{\text{min))\right|^{2))))$
MUSIC: ${\displaystyle {\hat {P))_{\text{MU))\left(e^{j\omega }\right)={\frac {1}{\sum _{i=p+1}^{M}\left|\mathbf {e} ^{H}\mathbf {v} _{i}\right|^{2))))$ ,
Eigenvector method: ${\displaystyle {\hat {P))_{\text{EV))\left(e^{j\omega }\right)={\frac {1}{\sum _{i=p+1}^{M}{\frac {1}{\lambda _{i))}\left|\mathbf {e} ^{H}\mathbf {v} _{i}\right|^{2))))$
Minimum norm method: ${\displaystyle {\hat {P))_{\text{MN))\left(e^{j\omega }\right)={\frac {1}{\left|\mathbf {e} ^{H}\mathbf {a} \right|^{2))};\ \mathbf {a} =\lambda \mathbf {P} _{n}\mathbf {u} _{1))$

Example calculation

Suppose ${\displaystyle x_{n))$ , from $n=0$ to $N-1$ is a time series (discrete time) with zero mean. Suppose that it is a sum of a finite number of periodic components (all frequencies are positive):

{\begin{aligned}x_{n}&=\sum _{k}A_{k}\sin(2\pi \nu _{k}n+\phi _{k})\\&=\sum _{k}A_{k}\left(\sin(\phi _{k})\cos(2\pi \nu _{k}n)+\cos(\phi _{k})\sin(2\pi \nu _{k}n)\right)\\&=\sum _{k}\left(\overbrace {a_{k)) ^{A_{k}\sin(\phi _{k})}\cos(2\pi \nu _{k}n)+\overbrace {b_{k)) ^{A_{k}\cos(\phi _{k})}\sin(2\pi \nu _{k}n)\right)\end{aligned))

The variance of ${\displaystyle x_{n))$ is, for a zero-mean function as above, given by

{\frac {1}{N))\sum _{n=0}^{N-1}x_{n}^{2}.

If these data were samples taken from an electrical signal, this would be its average power (power is energy per unit time, so it is analogous to variance if energy is analogous to the amplitude squared).

Now, for simplicity, suppose the signal extends infinitely in time, so we pass to the limit as $N\to \infty .$ If the average power is bounded, which is almost always the case in reality, then the following limit exists and is the variance of the data.

\lim _{N\to \infty }{\frac {1}{N))\sum _{n=0}^{N-1}x_{n}^{2}.

Again, for simplicity, we will pass to continuous time, and assume that the signal extends infinitely in time in both directions. Then these two formulas become

x(t)=\sum _{k}A_{k}\sin(2\pi \nu _{k}t+\phi _{k})

and

\lim _{T\to \infty }{\frac {1}{2T))\int _{-T}^{T}x(t)^{2}dt.

The root mean square of $\sin$ is $1/{\sqrt {2))$ , so the variance of $A_{k}\sin(2\pi \nu _{k}t+\phi _{k})$ is ${\tfrac {1}{2))A_{k}^{2}.$ Hence, the contribution to the average power of $x(t)$ coming from the component with frequency ${\displaystyle \nu _{k))$ is ${\tfrac {1}{2))A_{k}^{2}.$ All these contributions add up to the average power of $x(t).$

Then the power as a function of frequency is ${\tfrac {1}{2))A_{k}^{2},$ and its statistical cumulative distribution function $S(\nu )$ will be

S(\nu )=\sum _{k:\nu _{k}<\nu }{\frac {1}{2))A_{k}^{2}.

$S$ is a step function, monotonically non-decreasing. Its jumps occur at the frequencies of the periodic components of $x$ , and the value of each jump is the power or variance of that component.

The variance is the covariance of the data with itself. If we now consider the same data but with a lag of $\tau$ , we can take the covariance of $x(t)$ with $x(t+\tau )$ , and define this to be the autocorrelation function $c$ of the signal (or data) $x$ :

c(\tau )=\lim _{T\to \infty }{\frac {1}{2T))\int _{-T}^{T}x(t)x(t+\tau )dt.

If it exists, it is an even function of $\tau .$ If the average power is bounded, then $c$ exists everywhere, is finite, and is bounded by $c(0),$ which is the average power or variance of the data.

It can be shown that $c$ can be decomposed into periodic components with the same periods as $x$ :

c(\tau )=\sum _{k}{\frac {1}{2))A_{k}^{2}\cos(2\pi \nu _{k}\tau ).

This is in fact the spectral decomposition of $c$ over the different frequencies, and is related to the distribution of power of $x$ over the frequencies: the amplitude of a frequency component of $c$ is its contribution to the average power of the signal.

The power spectrum of this example is not continuous, and therefore does not have a derivative, and therefore this signal does not have a power spectral density function. In general, the power spectrum will usually be the sum of two parts: a line spectrum such as in this example, which is not continuous and does not have a density function, and a residue, which is absolutely continuous and does have a density function.

References

Center	Mean Arithmetic Arithmetic-Geometric Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test (normal) Student's t-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging