In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as error matrix,^{[1]} is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one; in unsupervised learning it is usually called a matching matrix.
Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature.^{[2]} The name stems from the fact that it makes it easy to see whether the system is confusing two classes (i.e. commonly mislabeling one as another).
It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).
Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancerfree, where individuals with cancer belong to class 1 (positive) and noncancer individuals belong to class 0 (negative), we can display that data as follows:
Individual Number  1  2  3  4  5  6  7  8  9  10  11  12 

Actual Classification  1  1  1  1  1  1  1  1  0  0  0  0 
Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancerfree (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9).
Individual Number  1  2  3  4  5  6  7  8  9  10  11  12 

Actual Classification  1  1  1  1  1  1  1  1  0  0  0  0 
Predicted Classification  0  0  1  1  1  1  1  1  1  0  0  0 
Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column. One, if the actual classification is positive and the predicted classification is positive (1,1), this is called a true positive result because the positive sample was correctly identified by the classifier. Two, if the actual classification is positive and the predicted classification is negative (1,0), this is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. Third, if the actual classification is negative and the predicted classification is positive (0,1), this is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. Fourth, if the actual classification is negative and the predicted classification is negative (0,0), this is called a true negative result because the negative sample gets correctly identified by the classifier.
We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable.
Individual Number  1  2  3  4  5  6  7  8  9  10  11  12 

Actual Classification  1  1  1  1  1  1  1  1  0  0  0  0 
Predicted Classification  0  0  1  1  1  1  1  1  1  0  0  0 
Result  FN  FN  TP  TP  TP  TP  TP  TP  FP  TN  TN  TN 
The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows:
Predicted condition  
Total population = P + N 
Positive (PP)  Negative (PN)  
Actual condition

Positive (P)  True positive (TP) 
False negative (FN) 
Negative (N)  False positive (FP) 
True negative (TN)  
^{Sources: }^{[3]}^{[4]}^{[5]}^{[6]}^{[7]}^{[8]}^{[9]}^{[10]} 
The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data.
Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier:
Predicted condition  
Total
8 + 4 = 12 
Cancer 7 
Noncancer 5  
Actual condition

Cancer 8 
6  2 
Noncancer 4 
1  3 
In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancerfree, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. and .
In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.
For example, if there were 95 cancer samples and only 5 noncancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the noncancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer).
According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC).^{[11]}
Other metrics can be included in a confusion matrix, each of them having their significance and use.
Predicted condition  ^{Sources: }^{[12]}^{[13]} ^{[14]}^{[15]}^{[16]}^{[17]}^{[18]}^{[19]} ^{.mwparseroutput .hlist dl,.mwparseroutput .hlist ol,.mwparseroutput .hlist ul{margin:0;padding:0}.mwparseroutput .hlist dd,.mwparseroutput .hlist dt,.mwparseroutput .hlist li{margin:0;display:inline}.mwparseroutput .hlist.inline,.mwparseroutput .hlist.inline dl,.mwparseroutput .hlist.inline ol,.mwparseroutput .hlist.inline ul,.mwparseroutput .hlist dl dl,.mwparseroutput .hlist dl ol,.mwparseroutput .hlist dl ul,.mwparseroutput .hlist ol dl,.mwparseroutput .hlist ol ol,.mwparseroutput .hlist ol ul,.mwparseroutput .hlist ul dl,.mwparseroutput .hlist ul ol,.mwparseroutput .hlist ul ul{display:inline}.mwparseroutput .hlist .mwemptyli{display:none}.mwparseroutput .hlist dt::after{content:": "}.mwparseroutput .hlist dd::after,.mwparseroutput .hlist li::after{content:" ยท ";fontweight:bold}.mwparseroutput .hlist dd:lastchild::after,.mwparseroutput .hlist dt:lastchild::after,.mwparseroutput .hlist li:lastchild::after{content:none}.mwparseroutput .hlist dd dd:firstchild::before,.mwparseroutput .hlist dd dt:firstchild::before,.mwparseroutput .hlist dd li:firstchild::before,.mwparseroutput .hlist dt dd:firstchild::before,.mwparseroutput .hlist dt dt:firstchild::before,.mwparseroutput .hlist dt li:firstchild::before,.mwparseroutput .hlist li dd:firstchild::before,.mwparseroutput .hlist li dt:firstchild::before,.mwparseroutput .hlist li li:firstchild::before{content:" (";fontweight:normal}.mwparseroutput .hlist dd dd:lastchild::after,.mwparseroutput .hlist dd dt:lastchild::after,.mwparseroutput .hlist dd li:lastchild::after,.mwparseroutput .hlist dt dd:lastchild::after,.mwparseroutput .hlist dt dt:lastchild::after,.mwparseroutput .hlist dt li:lastchild::after,.mwparseroutput .hlist li dd:lastchild::after,.mwparseroutput .hlist li dt:lastchild::after,.mwparseroutput .hlist li li:lastchild::after{content:")";fontweight:normal}.mwparseroutput .hlist ol{counterreset:listitem}.mwparseroutput .hlist ol>li{counterincrement:listitem}.mwparseroutput .hlist ol>li::before{content:" "counter(listitem)"\a0 "}.mwparseroutput .hlist dd ol>li:firstchild::before,.mwparseroutput .hlist dt ol>li:firstchild::before,.mwparseroutput .hlist li ol>li:firstchild::before{content:" ("counter(listitem)"\a0 "}.mwparseroutput .navbar{display:inline;fontsize:88%;fontweight:normal}.mwparseroutput .navbarcollapse{float:left;textalign:left}.mwparseroutput .navbarboxtext{wordspacing:0}.mwparseroutput .navbar ul{display:inlineblock;whitespace:nowrap;lineheight:inherit}.mwparseroutput .navbarbrackets::before{marginright:0.125em;content:"[ "}.mwparseroutput .navbarbrackets::after{marginleft:0.125em;content:" ]"}.mwparseroutput .navbar li{wordspacing:0.125em}.mwparseroutput .navbar a>span,.mwparseroutput .navbar a>abbr{textdecoration:inherit}.mwparseroutput .navbarmini abbr{fontvariant:smallcaps;borderbottom:none;textdecoration:none;cursor:inherit}.mwparseroutput .navbarctfull{fontsize:114%;margin:0 7em}.mwparseroutput .navbarctmini{fontsize:114%;margin:0 4em}viewtalkedit}  
Total population = P + N 
Predicted Positive (PP)  Predicted Negative (PN)  Informedness, bookmaker informedness (BM) = TPR + TNR − 1 
Prevalence threshold (PT) = √TPR × FPR  FPR/TPR  FPR  
Actual condition

Positive (P) ^{[a]}  True positive (TP), hit^{[b]} 
False negative (FN), miss, underestimation 
True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power = TP/P = 1 − FNR 
False negative rate (FNR), miss rate type II error ^{[c]} = FN/P = 1 − TPR 
Negative (N)^{[d]}  False positive (FP), false alarm, overestimation 
True negative (TN), correct rejection^{[e]} 
False positive rate (FPR), probability of false alarm, fallout type I error ^{[f]} = FP/N = 1 − TNR 
True negative rate (TNR), specificity (SPC), selectivity = TN/N = 1 − FPR  
Prevalence = P/P + N 
Positive predictive value (PPV), precision = TP/PP = 1 − FDR 
False omission rate (FOR) = FN/PN = 1 − NPV 
Positive likelihood ratio (LR+) = TPR/FPR 
Negative likelihood ratio (LR−) = FNR/TNR  
Accuracy (ACC) = TP + TN/P + N 
False discovery rate (FDR) = FP/PP = 1 − PPV 
Negative predictive value (NPV) = TN/PN = 1 − FOR 
Markedness (MK), deltaP (Δp) = PPV + NPV − 1 
Diagnostic odds ratio (DOR) = LR+/LR−  
Balanced accuracy (BA) = TPR + TNR/2 
F_{1} score = 2 PPV × TPR/PPV + TPR = 2 TP/2 TP + FP + FN 
Fowlkes–Mallows index (FM) = √PPV × TPR 
Matthews correlation coefficient (MCC) = √TPR × TNR × PPV × NPV  √FNR × FPR × FOR × FDR 
Threat score (TS), critical success index (CSI), Jaccard index = TP/TP + FN + FP 
Confusion matrix is not limited to binary classification and can be used in multiclass classifiers as well.^{[20]} The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, zero values omitted for clarity.^{[21]}
Perceived vowel Vowel
produced 
i  e  a  o  u 

i  15  1  
e  1  1  
a  79  5  
o  4  15  3  
u  2  2 