Reviewing the best team competition results.
1. How to train each single-model
1.1 Syntesize/Augment to generate huge amount of data
- Synthesize 5M new images: create 5M images, by combining left and right (almost) half of images from the same class. This is so good, it was able to train from-scratch google net V3 to 0.15 . ref
- Synthesize images by combing images from the test-set (As in this competition, they all used the same video)
1.2 If not possible, use pre-trained. The stronger, the better.
Resnet-152 > VGG-19 > VGG-16 > googlenet
1.3 Use semi-supervised learning
"dark-knowledge" - let an ensamble predict on the test-set, take most cofident. 6-12K images, don't use too many
* Some numbers to compare... to compare from the same team/model.googlenet V3
0.31 pre-trained , augmented (flip/rotate)
0.26 pretrained, augmented + "dark-knowledge"/semi-supervised
0.15 from scratch: but 5M synthesized images(!)
2. How to run a single model
If test-data data can be clusered use this fact (in this competition, yes, it was):
- hack-the-input and get 3rd place . as the input was sequence of images, use NN for better training and test. (resnet 0.27->0.18)
- Other-approach is to run all images on VGG, take mid-layer output and cluster it (1000 clusters) and use the cluster mean result
All images or part of it?
- Most ran the image as a whole (with/out clustering)
- R-CNN (tuned differently from object-detection) helped VGG 0.22>0.175
3. How to choose models for en-sample
- Try to use different models, trained differently. For example, one VGG and another Resnet. one augmented, the other not...
- X-fold is common, but basic.
4. How to combine models
- use scipy minimize function and create a custom geometric average function to minimize logloss of all models.