Let's look at one experiment confusion-matrix
[[259 0 7 30 1 8 0 0 7 13] [ 1 311 0 8 0 0 2 0 0 0] [ 3 0 288 5 1 0 8 0 13 0] [ 0 0 0 318 2 0 0 0 0 0] [ 6 0 0 10 297 2 0 0 1 0] [ 1 0 0 0 0 320 0 0 0 0] [ 0 0 0 1 0 0 308 0 11 0] [ 0 0 0 0 0 0 1 255 0 0] [ 0 0 7 1 0 0 1 3 234 8] [ 78 2 1 29 1 4 1 0 3 169]]
The X axis is prediction . the Y axis is true-label (all first row true-label is 0)
Let's have a look at row 0.
259 in [0,0] means true-positive results with correct match.
259 in [0,0] means true-positive results with correct match.
30 in [0,3] means the truths is 0, but we predicted 3
0 in [0,1] means we don't think (Wrongly) that 0 is 1
in total there 259 correct-predictions are 7+30+1+8+7+13=66 wrong predictions.
259/325 = 0.80 . This is the hit-rate, or the recall rate.
Let's look at column 0.
78 in [9,0] means we predicted 0, although it is actually 9. This is a biggest-mistake in one cell.
If we sum all the column, we see total of 1+3+6+1+78=89 false-positive predictions. in total we are correct in 259/(259+89)= 0.74 of our predictions, this is the precision.
To iterate on recall and precision, what will happen it we change the algorithm to a dump "always return 0" algorithm? column 0 will be filled with values. All other columns will be empty.
we will get 325 in [0,0] (all true) and the rest of the diagonal is all false.
The recall will be full 1.00 for 0 category . We always recall correctly this one. For the rest it will be 0.00
The precision will be very bad 325/3040 = ~ 10%
precision recall f1-score support
0 normal driving 0.74 0.80 0.77 325 1 texting - right 0.99 0.97 0.98 322 2 talking on the phone - right 0.95 0.91 0.93 318 3 texting - left 0.79 0.99 0.88 320 4 talking on the phone - left 0.98 0.94 0.96 316 5 operating the radio 0.96 1.00 0.98 321 6 drinking 0.96 0.96 0.96 320 7 reaching behind 0.99 1.00 0.99 256 8 hair and makeup 0.87 0.92 0.89 254 9 talking to passenger 0.89 0.59 0.71 288 avg / total 0.91 0.91 0.91 3040
Let's analyze back to the classification-report.
about "0 - normal-driving" we talked already.
We can see that "1- texting-right" has good recall 0.97, and also good precision 0.99
'3-texting-left" has 0.99 recall, but only 0.79 precision (it's too-strong) which means there are many false-assumptions, let's look at the confusion-matrix, at column 3. 30 predictions were actually 0-normal-driving and 29 are actually 9-talking-to-passenger.
True Positive (TP) eqv. with hit
False Positive (FP) eqv. with false alarm, Type I error
- sensitivity or true positive rate (TPR) eqv. with hit rate, recall
precision or positive predictive value (PPV)
No comments:
Post a Comment