Tuesday, July 12, 2016

StateFarm experiment 1

Let's start with a simple and quick to run model.

150x150 (32,3,3) (32,3,3) (64,3,3) -> Dense( 3x200) -> dropout0.5-> dense10


each epoc is  train: 5*1024 . validate= 1*1024. batch-32

model_chapter3
epoc 0 699s - loss: 18.1189 - acc: 0.2369 - val_loss: 2.1892 - val_acc: 0.3574
epoc 1 765s - loss: 7.5443 - acc: 0.4570 - val_loss: 1.5257 - val_acc: 0.4697
epoc 2 689s - loss: 3.5896 - acc: 0.6488 - val_loss: 1.9699 - val_acc: 0.3590
epoc 3 697s - loss: 1.8959 - acc: 0.7616 - val_loss: 1.8912 - val_acc: 0.3887
epoc 4 707s - loss: 1.2178 - acc: 0.7992 - val_loss: 1.5978 - val_acc: 0.4756
epoc 5 710s - loss: 0.9396 - acc: 0.8277 - val_loss: 1.6677 - val_acc: 0.5829
epoc 6 702s - loss: 0.8008 - acc: 0.8520 - val_loss: 1.9146 - val_acc: 0.5781
epoc 7 707s - loss: 0.6810 - acc: 0.8798 - val_loss: 1.3611 - val_acc: 0.5752
epoc 8 707s - loss: 0.6647 - acc: 0.8748 - val_loss: 1.8251 - val_acc: 0.5314
epoc 9 706s - loss: 0.6234 - acc: 0.8936 - val_loss: 1.5517 - val_acc: 0.5908
epoc 10 709s - loss: 0.5812 - acc: 0.9054 - val_loss: 1.8407 - val_acc: 0.5225

 Usually we will plot loss, but here I plot the accuracy graph (training converges to 95% while validation does not pass the 58%). 


continuing till epoc 30 reduce the training loss a bit, and the accuracy, but the validation does not improve. 
epoc 30 - loss: 0.3641 - acc: 0.9482 - val_loss: 1.3679 - val_acc: 0.6270



Notes on this run:
After epoc 5 (in this case epoc is sample of 1/4 of the images), we start to overfit.   Further epocs do not help  (validation stays the same while training loss reduced to be extremely small)

There could be two main reasons:
1. Model is too strong and not regularized enough -  Not the case here... it's small , heavy-regularzation and dropout.
2. Model is too strong compared to the data. I think this is the case.

The data
The number of training images is small (20k), further more, they are taken from ~20 videos of 20 actors, cut by frames, while the test set is from different video of different actors.
20 actors is not enough to regularize on all the people in the world.

What can be done?
  • More data is the obvious solution, but there is none.
  • Pretrained models are allowed in the competition, if they are pulic and can be used commercialy. Great imporevments were achieved using VGG-16  (10 times better) which can't be  commericaly used. What does the pretrained network give us?
    • Better visual filters on the lower filters.
    • Cellphone detection on the higher filters.
    • Probably good human detection, but not clear if good hand localization detection
  • Or use a cascade of a 2 pretrained-models creating features, combine them into an image/new-channel and provide this to a small model.
    • A good one for humans exist, but runs in 17s x20,000 images =  340K second / 86,400 = 3.93 days

Further experiment with similiar architectures



experiment 3
Dense 3x200. l2(0.01). BN on all layers exect the 1st dense. adam optimizer
711s - loss: 0.4788 - acc: 0.9227 - val_loss: 2.0435 - val_acc: 0.5019
Saved model to disk model_chapter3_17epoc
#Validation : SCORE of model_chapter3_17epoc 0.290623311932 accuracy 0.434080421885
#  Leader-board score = 1.64778



experiment 4
experiment 4 ran with: dense: 200-100-50 . Full BN. Pre-relu  SGD(lr=0.001, decay=1e-7, momentum=.9) optimizer. 


experiment 5


expeiment 5 ran with: dense 256-124-64. BN on allbut the 1st dense. regular Relu. Adam optimizer

5120/5120 - 1012s - loss: 0.4410 - acc: 0.9084 - val_loss: 1.0536 - val_acc: 0.6631
Saved model to disk model_chapter5_18epoc



No comments: