On Software Development: ConfusionMatrix

Let's look at one experiment confusion-matrix

[[259   0   7  30   1   8   0   0   7  13]
 [  1 311   0   8   0   0   2   0   0   0]
 [  3   0 288   5   1   0   8   0  13   0]
 [  0   0   0 318   2   0   0   0   0   0]
 [  6   0   0  10 297   2   0   0   1   0]
 [  1   0   0   0   0 320   0   0   0   0]
 [  0   0   0   1   0   0 308   0  11   0]
 [  0   0   0   0   0   0   1 255   0   0]
 [  0   0   7   1   0   0   1   3 234   8]
 [ 78   2   1  29   1   4   1   0   3 169]]

The X axis is prediction . the Y axis is true-label (all first row true-label is 0)

Let's have a look at row 0.
259 in [0,0] means true-positive results with correct match.

30 in [0,3] means the truths is 0, but we predicted 3

0 in [0,1] means we don't think (Wrongly) that 0 is 1

in total there 259 correct-predictions are 7+30+1+8+7+13=66 wrong predictions.

259/325 = 0.80 . This is the hit-rate, or the recall rate.

Let's look at column 0.

78 in [9,0] means we predicted 0, although it is actually 9. This is a biggest-mistake in one cell.

If we sum all the column, we see total of 1+3+6+1+78=89 false-positive predictions. in total we are correct in 259/(259+89)= 0.74 of our predictions, this is the precision.

To iterate on recall and precision, what will happen it we change the algorithm to a dump "always return 0" algorithm? column 0 will be filled with values. All other columns will be empty.

we will get 325 in [0,0] (all true) and the rest of the diagonal is all false.

The recall will be full 1.00 for 0 category . We always recall correctly this one. For the rest it will be 0.00

The precision will be very bad 325/3040 = ~ 10%

precision recall f1-score support

              0 normal driving       0.74      0.80      0.77       325
             1 texting - right       0.99      0.97      0.98       322
2 talking on the phone - right       0.95      0.91      0.93       318
              3 texting - left       0.79      0.99      0.88       320
 4 talking on the phone - left       0.98      0.94      0.96       316
         5 operating the radio       0.96      1.00      0.98       321
                    6 drinking       0.96      0.96      0.96       320
             7 reaching behind       0.99      1.00      0.99       256
             8 hair and makeup       0.87      0.92      0.89       254
        9 talking to passenger       0.89      0.59      0.71       288

                   avg / total       0.91      0.91      0.91      3040

Let's analyze back to the classification-report.

about "0 - normal-driving" we talked already.

We can see that "1- texting-right" has good recall 0.97, and also good precision 0.99

'3-texting-left" has 0.99 recall, but only 0.79 precision (it's too-strong) which means there are many false-assumptions, let's look at the confusion-matrix, at column 3. 30 predictions were actually 0-normal-driving and 29 are actually 9-talking-to-passenger.

True Positive (TP) eqv. with hit

False Positive (FP) eqv. with false alarm, Type I error

sensitivity or true positive rate (TPR) eqv. with hit rate, recall