Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models

Published in IEEE ICIP 2019, 2019

Recommended citation: Yoshioka, Kentaro. (2019). "Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models." IEEE ICIP.

Dataset Culling filters out images that are easy to classify since they contribute little to improving the accuracy. The difficulty is measured using our proposed confidence loss metric with little computational overhead. We show that the dataset size can be culled by a factor of 300× to reduce the total training time by 47× with no accuracy loss or even with slight improvement.

Download paper here

Codes

Datasetculling