In combinatorics, a branch of mathematics, the autocorrelation of a word is the set of periods of this word. More precisely, it is a sequence of values which indicate how much the end of a word looks likes the beginning of a word. This value can be used to compute, for example, the average value of the first occurrence of this word in a random string.

Definition

In this article, A is an alphabet, and a word on A of length n. The autocorrelation of can be defined as the correlation of with itself. However, we redefine this notion below.

Autocorrelation vector

The autocorrelation vector of is , with being 1 if the prefix of length equals the suffix of length , and with being 0 otherwise. That is indicates whether .

For example, the autocorrelation vector of is since, clearly, for being 0, 1 or 2, the prefix of length is equal to the suffix of length . The autocorrelation vector of is since no strict prefix is equal to a strict suffix. Finally, the autocorrelation vector of is 100011, as shown in the following table:

a a b b a a
a a b b a a 1
a a b b a a 0
a a b b a a 0
a a b b a a 0
a a b b a a 1
a a b b a a 1

Note that is always equal to 1, since the prefix and the suffix of length are both equal to the word . Similarly, is 1 if and only if the first and the last letters are the same.


Autocorrelation polynomial

The autocorrelation polynomial of is defined as . It is a polynomial of degree at most .

For example, the autocorrelation polynomial of is and the autocorrelation polynomial of is . Finally, the autocorrelation polynomial of is .

Property

We now indicate some properties which can be computed using the autocorrelation polynomial.

First occurrence of a word in a random string

Suppose that you choose an infinite sequence of letters of , randomly, each letter with probability , where is the number of letters of . Let us call the expectation of the first occurrence of ? in ? . Then equals . That is, each subword of which is both a prefix and a suffix causes the average value of the first occurrence of to occur letters later. Here is the length of v.

For example, over the binary alphabet , the first occurrence of is at position while the average first occurrence of is at position . Intuitively, the fact that the first occurrence of is later than the first occurrence of can be explained in two ways:

Ordinary generating functions

Autocorrelation polynomials allows to give simple equations for the ordinary generating functions (OGF) of many natural questions.

References