2019.9.4 note

2019.9.4 note

A Simple Theoretical Model of Importance for Summarization

  1. Define Redundancy, Relevance and Informativeness.

  2. Prove the formulation of the theoretical model of importance according to some assumptions.

  3. Conduct experiments to show that their model correlates well with human judgments.

LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules

It encodes logic rules into knowledge graphs embedding by adding regularization terms in optimization.

Norm-Preservation

  1. It analyzes the effect of slip connections and explains that ResNet can be so deep because the Norm Preservation mechanism in residual blocks and proves it.
  2. It enhances the norm-preservation ability by stacking more layers.
  3. It pushes extra norm-preservation by regularize the singular values.

Squeeze-and-Excitation Networks

SEblock(x)=xscale(x)SE-block(x)=x\odot scale(x) where scale(x): [H,W,C] -> global pooling -> [1,1,C] -> FC and relu -> [1, 1, C/r] -> FC and sigmoid -> [1, 1, C] -> copy -> [H, W, C].

SE-block can before, after or parallel to other blocks.

ON THE VALIDITY OF SELF-ATTENTION AS EXPLANATION IN TRANSFORMER MODELS

  1. In transformers, hidden states of position i are mixed of all word embeddings and word i plays a small role in hidden states of position i of intermediate layers.
  2. However, the contribution (defined in paper, according to gradients) of word i in position i of intermediate layers is still maximum in all words.

ONE MODEL TO RULE THEM ALL

It present a new flavor of Variational Auto Encoder (VAE) that interpolates seamlessly between unsupervised, semi-supervised and fully supervised learning domains.
The VAE model x>(π,μ,σ)>xrecong,L=LELBO+Lclx->(\pi, \mu, \sigma)->x_{recong}, L=L_{ELBO}+L_{cl}π\pi is one-hot vector for classification and LclL_{cl} is the CE classification loss function (only for labeled data). For semi-supervision, (π,μ,σ)(\pi, \mu, \sigma) treated as latent state. For supervised classifier, π\pi is treated as input and xrecongx_{recong} is treated as output. For unsupervised anomaly detector, (μ,σ)(\mu, \sigma) is treated as latent state.
2019.9.4 note

Smaller Models, Better Generalization

It analyzes the network complexity based on upper bound on VC-dim. It attempts to extend the ideas of minimal complexity machines and learn the weights of a neural network by minimizing the empirical error and an upper bound on the VC dimension. It proposes a pruning method and analyzes the quantization. We observe that pruning and then quantizing the models helps to achieve comparable or better sparsity in terms of weights and allows for better generalization abilities.

Testing Robustness Against Unforeseen Adversaries

It proposes some novel adversaries: LpL_p-JPEG, FOG, Gabor and Snow.

2019.9.4 note
The adversarial attack means that for a target label yyy'\ne y, find xx' under some construction to ensure that l(f(x),y)l(f(x'), y') is minimized. (The optimization method is interesting.)

It also proposes a methodology for assessing robustness against unforeseen adversaries. It proposes an adversarial robustness metric named UARUAR. It also analyzes adversarial training against a single distortion type and joint adversarial training.