Title:
Unsupervised Representation Learning by Sorting Sequences
Abstract:
Success of DNNs is primarily driven by supervised learning on large (millions of) manually annotated data such as the ImageNet.
However, this approach substantially limits the scalability to new problem domains because manual annotations are often expensive
and in some cases scarce (e.g., labeling medical images). A new paradigm, called self-supervised learning, seeks to leverage the
vast amount of freely available unlabeled images and videos for learning (universally powerful) representations. In such frameworks,
a surrogate supervisory task is defined that leverages the inherent structure of the data. native or reconstruction loss function
to train the network.
Examples include predicting the relative patch positions, reconstructing missing pixel values conditioned on the known surrounding,
or predicting one subset of the data channels from another (e.g., predicting color channels from a gray image) [19, 42, 43]. In this
work, we propose a surrogate task for self-supervised learning using a large collection of unlabeled videos. Given a tuple of randomly
shuffled frames, we train a neural network to sort the images into chronological order. The representations learnt are shown to be
competitive with the state of the art on high-level recognition problems like object detection, classification and human action
recognition.
|