00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

Paper:

Author:

David Eigen: Dept. of Computer Science, Courant Institute, New York University

Rob Fergus: FaceBook AI Research

Abstract:

1. multiscal conv : depth prediction, surface normal estimation, and semantic labeling.

2. refines predictions using a sequence of scales, and captures many image details without and superpixels or low-level segmentation.

1. Introduction

In this paper, we address three of these tasks, depth prediction,surface normal estimation and semantic segmentation— all using a single common architecture.

Several advantages:

First, appropriate training set and loss function.

Second, simplify the implementation of systems that require multiple modalities.

Third, much of the computation can be shared between modalities , making the system more efficient.

2. Related Work

Most of these systems use ConvNets to find only local features, or generate descriptors of discrete proposal regions; by contrast, our network uses both local and global views to predict a variety of output types.

3. Model Architecture

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

Scale 1: Full-Image View :

The first scale in the network predicts a coarse but spatially-varying set of features for the entire image area, based on a large, full-image field of view, which we accomplish this through the use of two fully-connected layers.

Scale 2: Predictions:

The job of the second scale is to produce predictions at a mid-level resolution, by incorporating a more detailed but narrower view of the image along with the full-image information supplied by the coarse network.

Scale 3: Higher Resolution:

The final scale of our model refines the predictions to higher resolution.

4. Tasks

We apply this same architecture structure to each of the three tasks we investigate: depths, normals and semantic labeling. Each makes use of a different loss function and target data defining the task.

4.1 Depth

LOSS:

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

4.2 Surface Normals

To predict surface normals, we change the output from one channel to three, and predict the x, y and z components of the normal at each pixel.

LOSS:

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

4.3 Semantic Labels

For semantic labeling, we use a pixelwise softmax classifier to predict a class label for each pixel.

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

5 Training

5.1 Training Procedure

We train our model in two phases using SGD: First, we jointly train both Scales 1 and 2. Second, we fix the parameters of these scales and train Scale 3.

5.2 Data Augmentation

We use random scaling, in-plane rotation, translation, color, flips and contrast.

5.3 Combining Depth and Normals

6. Performance Experiments

6.1 Depth

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

6.2 Surface Normals

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

6.3 Semantic Labels

6.3.1 NYU Depth

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

6.3.2 Sift Flow

Sift flow: dense correspondence across difference scenes.

we weight each pixel by

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

6.3.3 Pascal voc

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

7. Probe Experiments

7.1 Contributions of Scales

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

7.2 Effect of Depth and Normals Inputs

(i) How important are the depth and normals inputs relative to RGB in the semantic labeling task?

(ii) What might happen if we were to replace the true depth and normals inputs with the predictions made by our network?

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

00040-Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional

Author:

Abstract:

1. Introduction

2. Related Work

3. Model Architecture

4. Tasks

5 Training

6. Performance Experiments

7. Probe Experiments

8 Discussion

相关推荐