2019.9.4 note
2019.9.4 note
A Simple Theoretical Model of Importance for Summarization
-
Define Redundancy, Relevance and Informativeness.
-
Prove the formulation of the theoretical model of importance according to some assumptions.
-
Conduct experiments to show that their model correlates well with human judgments.
LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules
It encodes logic rules into knowledge graphs embedding by adding regularization terms in optimization.
Norm-Preservation
- It analyzes the effect of slip connections and explains that ResNet can be so deep because the Norm Preservation mechanism in residual blocks and proves it.
- It enhances the norm-preservation ability by stacking more layers.
- It pushes extra norm-preservation by regularize the singular values.
Squeeze-and-Excitation Networks
where scale(x): [H,W,C] -> global pooling -> [1,1,C] -> FC and relu -> [1, 1, C/r] -> FC and sigmoid -> [1, 1, C] -> copy -> [H, W, C].
SE-block can before, after or parallel to other blocks.
ON THE VALIDITY OF SELF-ATTENTION AS EXPLANATION IN TRANSFORMER MODELS
- In transformers, hidden states of position i are mixed of all word embeddings and word i plays a small role in hidden states of position i of intermediate layers.
- However, the contribution (defined in paper, according to gradients) of word i in position i of intermediate layers is still maximum in all words.
ONE MODEL TO RULE THEM ALL
It present a new flavor of Variational Auto Encoder (VAE) that interpolates seamlessly between unsupervised, semi-supervised and fully supervised learning domains.
The VAE model , is one-hot vector for classification and is the CE classification loss function (only for labeled data). For semi-supervision, treated as latent state. For supervised classifier, is treated as input and is treated as output. For unsupervised anomaly detector, is treated as latent state.
Smaller Models, Better Generalization
It analyzes the network complexity based on upper bound on VC-dim. It attempts to extend the ideas of minimal complexity machines and learn the weights of a neural network by minimizing the empirical error and an upper bound on the VC dimension. It proposes a pruning method and analyzes the quantization. We observe that pruning and then quantizing the models helps to achieve comparable or better sparsity in terms of weights and allows for better generalization abilities.
Testing Robustness Against Unforeseen Adversaries
It proposes some novel adversaries: -JPEG, FOG, Gabor and Snow.
The adversarial attack means that for a target label , find under some construction to ensure that is minimized. (The optimization method is interesting.)
It also proposes a methodology for assessing robustness against unforeseen adversaries. It proposes an adversarial robustness metric named . It also analyzes adversarial training against a single distortion type and joint adversarial training.