【论文阅读】 输入法相关论文二 LONG SHORT TERM MEMORY NEURAL NETWORK
文章的主要思想即采用 lstm 的网络架构进行手势识别的解码,loss 采用的是CTC loss。
1.
滑动输入的特征,即LSTM的输入是什么? TODO
Contain : (x, y) position,
time since last gesture and gesture type (move, up, down). Features:
2. LSTM + FST , FST 是一个什么样的过程 TODO
论文中的图示,y 是LSTM 的输出 ,其 log probablity 作为 FST 的转移概率。
3. 数据集:
Data
|
total words
|
unique words
|
collected from
|
other
|
Salt
|
14,500
13,000
1500
|
120
|
collected from
40 opt-in individuals in a wizard of oz fashion, modelled after the
Pepper study
|
Small Data
|
ALK
|
50,000
45,000
5,000
|
5,450
|
anonymized real gestures collected from opt-in
Google employees
|
Due to heavy preprocessing, this
dataset contained roughly 70% words that had a single down
and up taps and 30% which had multiples
This is particularly of interest,
as we find the baseline system to be unable to capture themulti-tap gestures
|
SYS
|
138,000
124,000
14,000
|
8,256
|
The third dataset wassynthetically
generated from the Enron dataset [20].
|
by connecting the characters
within a word using an algorithm that minimizes jerk, which closely fits human motor control. We also
allowed for variability in the length of the sequences
|
4. Result :
实验结果分析:
1) consistently BLSTM 优于 ULSTM, BLSTM 在Salt这种小训练数据集上效果不好
2) 对比 ALK 与 Enron 训练数据越大,效果越好,但不理解的是为什么在 Salt 上也可以达到这么好的效果?
3) BaseLine 在 clean data上效果很好,但是对于 一些存在噪声的数据,如 multi-tap
4) 对于 learning rate decay 的方法,有助于效果的提升,对于表现最好的模型,BLSTM-34-400*, 采用decay的方法,实验效果提升了大概 5% 在 ALK的数据集上。