deeplearning.ai - 序列模型和注意力机制

序列模型
吴恩达 Andrew Ng

Sequence models & Attention mechanism

Various sequence to sequence architectures

Basic Models

  • sequence to sequence model
  • image captioning 图像描述

Picking the most likely sentence

  • condition language model

    deeplearning.ai - 序列模型和注意力机制

  • just pick one word at a time (greedy search) is not always optimal

  • approximate search algorithm

Beam Search 集束搜索

  • coalition 编码,decodlition 解码

  • beam width (B)集束宽,候选词的个数

  • record top B possiblities of sentences

  • 步骤
    deeplearning.ai - 序列模型和注意力机制
    deeplearning.ai - 序列模型和注意力机制
    deeplearning.ai - 序列模型和注意力机制

Refinements to Beam Search

  • 1Tyαy=1TylogP(y<t>|x,y<1>,,y<t1>)

    • length normalization 长度归一化
      deeplearning.ai - 序列模型和注意力机制

    • numerical underflow 数值下溢,rounding error 四舍五入的误差

    • α 是超参数

    • normalized log likelihood objective 归一化的对数似然目标函数

  • large B: better result but computationally slower

  • small B: worse result but faster

  • Beam Search is not guaranteed to find exact maximum for argmaxyP(y|x)

Bleu Score (optional)

  • give a score to measure how good is the machine translation
  • Bleu score on unigrams (一元词组), bigrams (二元词组)
  • BP: brevity penalty 简短惩罚
  • single real number evaluation metric 单一实数评估指标

Attention Model Intuition (注意力模型)

  • 神经网络很难记忆长句子

  • 一部分一部分来机器翻译

  • deeplearning.ai - 序列模型和注意力机制

Attention Model

deeplearning.ai - 序列模型和注意力机制

Speech recognition - Audio data

Speech recognition

  • ​false blank outputs 伪空白输出

  • phonemes, hand-engineered basic units of cells

  • end-to-end network, input an audio clip and directly output a transcript

  • Connectionist Temporal Classification cost function

    collapse repeated characters not separated by “blank”

Trigger Word Detection 触发字检测

label​

deeplearning.ai - 序列模型和注意力机制

Conclusion

Conclusion and thank you

Make the world a better place.​