Thursday, August 11, 2016

statefarm - retrospect

Reviewing the best team competition results.

1. How to train each single-model 

1.1 Syntesize/Augment to generate huge amount of data
  • Synthesize 5M new images: create 5M images, by combining left and right (almost) half of images from the same class.  This is so good, it was abet o train from-scratch google net V3 to 0.15 . ref
  • Synthesize images by combing images from the test-set (As in this competition, they all used the same video)

1.2 If not possible, use pre-trained. The stronger, the better.
Resnet-152 > VGG-19 > VGG-16 > googlenet 

1.3 Use semi-supervised learning
"dark-knowledge" - let an ensample predict on the test-set, take most cofident. 6-12K images, don't use too many

Some numbers to compare... to compare from the same team/model.googlenet V3 
0.31 pre-trained , augmented (flip/rotate)
0.26 pretrained, augmented + "dark-knowledge"/semi-supervised
0.15 from scratch: but 5M synthesized images(!)

2. How to run a single model

If test-data data can be clusered use this fact (in this competition, yes, it was):
  • hack-the-input and get 3rd place . as the input was sequence of images, use NN for better training and test.  (resnet 0.27->0.18)
  • Other-approach is to run all images on VGG, take mid-layer output and cluster it (1000 clusters) and use the cluster mean result

All images or part of it?
  • Most ran the image as a whole (with/out clustering)
  • R-CNN (tuned differently from object-detection) helped  VGG 0.22>0.175

3. How to choose models for en-sample

  • Try to use different models, trained differently. For example, one VGG and another Resnet. one augmented, the other not...
  • X-fold is common, but basic.

4. How to combine models

  • use scipy minimize function and create a custom geometric average function to minimize logloss of all models.

Statefarm - experiment 3 - VGG16 finetune

VGG:   [conv(x2/3)->max-pool]xfew-times Then classifier head (4096->4096->1000)

Finetune VGG16

I saw one approach , with great results(!) where the the whole model was loaded, and only the last softmax layer was changed from the original(1000) to the new target (10).
In that case finetuning was done on ALL the model together, with slow learning-rate (sgd 1e-4)

I will use another approach:
We will replace the whole classifier head (4096->4096->1000).
1. [optional to save time later]  Load the model without the last dense part. Run it once on all the images and save to disk all the intermediate output  (512x7x7) per image of the train/validate/test info.  for reference, 10K files should take 1.9 GB of disk space.

2. Create alternative classifier. I used a small one due to a bit weak machine.   256->10
    model = Sequential()  
    model.add(Dense(256, activation='relu'))
    model.add(Dense(10, activation='softmax'))

Train it. I used few optimizers
SGD(lr=1e-3, momentum=0.9, nesterov=True)

SGD(lr=1e-3, momentum=0.9, nesterov=False)- BESTSaved model to disk vgg_head_only1_out_0_epoc20

SGD(lr=1e-4, momentum=0.9, nesterov=True)

SGD(lr=1e-4, momentum=0.9, nesterov=False)