【信息技术】【2015.03】基于深度神经网络的语音识别

【信息技术】【2015.03】基于深度神经网络的语音识别
本文为美国斯坦福大学(作者:Andrew Lee Maas)的博士论文,共191页。

随着计算机设备渗透到日常生活的方方面面,口语是一种越来越普遍的界面选择方式。自动理解口语是一个巨大的挑战,因为它既需要将语音信号转换成单词,又需要从单词本身提取意义。口语理解任务可以大致分为不同的部分,它们执行(1)音频信号的低级处理,(2)语音转录和(3)自然语言理解。我们描述了改善与口语理解相关的每个子任务的各个组成部分的方法。我们主要依靠基于机器学习的方法来代替手工设计的方法,并且一致地发现,在从数据中学习时,只要对问题的假设最少,就能提高性能。我们特别关注神经网络方法来解决这个问题。神经网络在最近重新引起了人们的兴趣,因为当有更多的数据可用时,神经网络有能力扩展学习越来越复杂的函数。神经网络最近在计算机视觉领域推动了巨大的进步,许多任务很容易转化为分类和回归问题。然而,在口语理解中,很难定义容易被形式化为神经网络要解决的问题的任务。我们的工作与这些复杂的系统相结合表明,与计算机视觉一样,神经网络可以显著改善口语理解系统。

Spoken language is an increasinglypervasive interface choice as computing devices permeate many aspects of dailylife. Automatically understanding spoken language poses significant challengesbecause it requires both converting a speech signal into words and extractingmeaning from the words themselves. Spoken language understanding tasks canroughly be broken into distinct components which perform (1) low-levelprocessing of the audio signal, (2) speech transcription, and (3) naturallanguage understanding. We describe approaches to improving individualcomponents for each sub-task associated with spoken language understanding. Ourmethods primarily rely on machine-learning-based approaches to replacehand-engineered approaches and consistently find that learning from data withminimal assumptions about a problem results in improved performance. Inparticular, we focus on neural network approaches to problems. Neural networkshave seen a recent resurgence of interest thanks to their ability to scale tolearn increasingly complex functions when more data becomes available. Neuralnetworks have recently driven tremendous progress in the field of computervision, where many tasks easily translate into classification and regressionproblems. In spoken language understanding, however, it is more difficult todefine tasks which are easily formalized into problems for a neural network tosolve. Our work integrates with these complex systems and shows that, like incomputer vision, neural networks can significantly improve spoken languageunderstanding systems.

  1. 引言
  2. 项目背景
  3. TREPAN算法
  4. TREPAN经验评估
  5. TREPAN解析评估
  6. MOFN-SWS算法:提取M-of-N准则的本地方法
  7. 基于Boosting的感知学习算法
  8. 其他相关工作
  9. 结论
    附录A 基于TREPAN提取表示树

更多精彩文章请关注公众号:【信息技术】【2015.03】基于深度神经网络的语音识别