00036-Xception:Deep Learning with Depthwise Separable Convolutions

Author: Francois Chollet  ----Google---Keras作者、谷歌大脑

Key Words:

Depthwise Separable Convolutions-------

00036-Xception:Deep Learning with Depthwise Separable Convolutions

1. Introduction

1.1 The Inception Hypothesis

A single convolution kernel is tasked with simultaneously mapping cross-channel correlations and spatial correlations.

1.2 The continuum between convolutions and separable convolutions

Two minor differences between and “extreme” version of an Inception module and a depthwise separable convolution would be:

  1. The order of the operations: depthwise separable convolutions as usually implemented (e.g. in TensorFlow) perform first channel-wise spatial convolution and then perform 1x1 convolution, whereas Inception performs the 1x1 convolution first.
  2. The presence or absence of a non-linearity after the first operation. In Inception, both operations are followed by a ReLU non-linearity, however depthwise separable convolutions are usually implemented without non-linearities.

2. Prior work

  1. CNN, in particular the VGG-16
  2. Inception architecture
  3. Depthwise separable convolutions
  4. Residual connections

3. The Xception architecture

we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations in the feature maps of convolutional neural networks can be entirely decoupled.

00036-Xception:Deep Learning with Depthwise Separable Convolutions

In short, the Xception architecture is a linear stack of depthwise separable convolution layers with residual connections.

4 Experimental evaluation

4.1 The JFT dataset

JFT is an internal Google dataset for large-scale image classification dataset, first introduced by Hinton et al. in [5], which comprises over 350 million high-resolution images annotated with labels from a set of 17,000 classes. To evaluate the performance of a model trained on JFT, we use an

auxiliary dataset, FastEval14k.

4.2 Optimization configuration

PASS

4.3 Regularization configuration

PASS

4.4 Training infrastructure

4.5 Comparison with Inception V3

4.5.1 Classification performance

00036-Xception:Deep Learning with Depthwise Separable Convolutions

4.5.2 Size and speed

00036-Xception:Deep Learning with Depthwise Separable Convolutions

4.6 Effect of the residual connections

00036-Xception:Deep Learning with Depthwise Separable Convolutions

4.7 Effect of an intermediate activation after pointwise convolutions

00036-Xception:Deep Learning with Depthwise Separable Convolutions

5 Future directions

6 Conclusions