Let's start with a simple and quick to run model.
150x150 (32,3,3) (32,3,3) (64,3,3) -> Dense( 3x200) -> dropout0.5-> dense10
each epoc is train: 5*1024 . validate= 1*1024. batch-32
continuing till epoc 30 reduce the training loss a bit, and the accuracy, but the validation does not improve.
epoc 30 - loss: 0.3641 - acc: 0.9482 - val_loss: 1.3679 - val_acc: 0.6270
Notes on this run:
After epoc 5 (in this case epoc is sample of 1/4 of the images), we start to overfit. Further epocs do not help (validation stays the same while training loss reduced to be extremely small)
There could be two main reasons:
1. Model is too strong and not regularized enough - Not the case here... it's small , heavy-regularzation and dropout.
2. Model is too strong compared to the data. I think this is the case.
The data
The number of training images is small (20k), further more, they are taken from ~20 videos of 20 actors, cut by frames, while the test set is from different video of different actors.
20 actors is not enough to regularize on all the people in the world.
What can be done?
- More data is the obvious solution, but there is none.
- Pretrained models are allowed in the competition, if they are pulic and can be used commercialy. Great imporevments were achieved using VGG-16 (10 times better) which can't be commericaly used. What does the pretrained network give us?
- Better visual filters on the lower filters.
- Cellphone detection on the higher filters.
- Probably good human detection, but not clear if good hand localization detection
- Or use a cascade of a 2 pretrained-models creating features, combine them into an image/new-channel and provide this to a small model.
- A good one for humans exist, but runs in 17s x20,000 images = 340K second / 86,400 = 3.93 days
Further experiment with similiar architectures
experiment 3 |
Dense 3x200. l2(0.01). BN on all layers exect the 1st dense. adam optimizer 711s - loss: 0.4788 - acc: 0.9227 - val_loss: 2.0435 - val_acc: 0.5019 Saved model to disk model_chapter3_17epoc#Validation : SCORE of model_chapter3_17epoc 0.290623311932 accuracy 0.434080421885
# Leader-board score = 1.64778
experiment 4 |
experiment 4 ran with: dense: 200-100-50 . Full BN. Pre-relu SGD(lr=0.001, decay=1e-7, momentum=.9) optimizer.
experiment 5 |
expeiment 5 ran with: dense 256-124-64. BN on allbut the 1st dense. regular Relu. Adam optimizer
5120/5120 - 1012s - loss: 0.4410 - acc: 0.9084 - val_loss: 1.0536 - val_acc: 0.6631 Saved model to disk model_chapter5_18epoc
No comments:
Post a Comment