Wednesday, July 13, 2016

ConfusionMatrix

Let's look at one experiment confusion-matrix

[[259   0   7  30   1   8   0   0   7  13]
 [  1 311   0   8   0   0   2   0   0   0]
 [  3   0 288   5   1   0   8   0  13   0]
 [  0   0   0 318   2   0   0   0   0   0]
 [  6   0   0  10 297   2   0   0   1   0]
 [  1   0   0   0   0 320   0   0   0   0]
 [  0   0   0   1   0   0 308   0  11   0]
 [  0   0   0   0   0   0   1 255   0   0]
 [  0   0   7   1   0   0   1   3 234   8]
 [ 78   2   1  29   1   4   1   0   3 169]]

The X axis is prediction . the Y axis is true-label (all first row true-label is 0)
Let's have a look at row 0.
259 in [0,0] means true-positive results with correct match.
30 in   [0,3]  means the truths is 0, but we predicted 3
0 in     [0,1] means we don't think (Wrongly) that 0 is 1
in total there 259 correct-predictions are 7+30+1+8+7+13=66 wrong predictions.
259/325 = 0.80  . This is the hit-rate, or the recall rate.
Let's look at column 0.
78 in [9,0] means we predicted 0, although it is actually 9.  This is a biggest-mistake in one cell. 
If we sum all the column, we see total of 1+3+6+1+78=89 false-positive predictions.  in total we are correct in 259/(259+89)= 0.74 of our predictions, this is the precision.

To iterate on recall and precision, what will happen it we change the algorithm to a dump "always return 0" algorithm?  column 0 will be filled with values. All other columns will be empty.
we will get 325 in [0,0] (all true) and the rest of the diagonal is all false.
The recall will be full 1.00 for 0 category  . We always recall correctly this one.  For the rest it will be 0.00
The precision will be very bad 325/3040 = ~ 10%


precision recall f1-score support
              0 normal driving       0.74      0.80      0.77       325
             1 texting - right       0.99      0.97      0.98       322
2 talking on the phone - right       0.95      0.91      0.93       318
              3 texting - left       0.79      0.99      0.88       320
 4 talking on the phone - left       0.98      0.94      0.96       316
         5 operating the radio       0.96      1.00      0.98       321
                    6 drinking       0.96      0.96      0.96       320
             7 reaching behind       0.99      1.00      0.99       256
             8 hair and makeup       0.87      0.92      0.89       254
        9 talking to passenger       0.89      0.59      0.71       288

                   avg / total       0.91      0.91      0.91      3040


Let's analyze back to the classification-report.
about "0 - normal-driving" we talked already.
We can see that "1- texting-right" has good recall 0.97, and also good precision 0.99
'3-texting-left" has 0.99 recall, but only 0.79 precision (it's too-strong) which means there are many false-assumptions, let's look at the confusion-matrix, at column 3. 30 predictions were actually 0-normal-driving and 29 are actually 9-talking-to-passenger.   


True Positive (TP)  eqv. with hit
False Positive (FP) eqv. with false alarm, Type I error


sensitivity or true positive rate (TPR) eqv. with hit rate, recall

precision or positive predictive value (PPV)

F1 score - is the harmonic mean of precision and sensitivity






No comments: