Matsushita Lab.

Multi-task Learning using Multi-modal Encoder-Decoder Networks with Shared Skip Connections

Multi-task learning is a promising approach for efficiently and effectively addressing multiple mutually related recognition tasks. Many scene understanding tasks such as semantic segmentation and depth prediction can be framed as cross-modal encoding/decoding, and hence most of the prior work used multi-modal datasets for multi-task learn- ing. However, the inter-modal commonalities, such as one across image, depth, and semantic labels, have not been fully exploited. We propose a multi-modal encoder-decoder networks to harness the multi-modal nature of multi-task scene recognition. In addition to the shared latent representation among encoder-decoder pairs, our model also has shared skip connections from different encoders. By combining these two representation sharing mechanisms, the proposed method efficiently learns a shared feature representation among all modalities in the training data.


  • R. Kuga, A. Kanezaki, M. Samejima, Y. Sugano and Y. Matsushita, “Multi-task Learning Using Multi-modal Encoder-Decoder Networks with Shared Skip Connections,” 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, 2017, pp. 403-411.
Matsushita Lab.