【论文阅读】输入法相关论文二 LONG SHORT TERM MEMORY NEURAL NETWORK

文章的主要思想即采用 lstm 的网络架构进行手势识别的解码，loss 采用的是CTC loss。

1. 滑动输入的特征，即LSTM的输入是什么？ TODO

Contain ：（x， y） position, time since last gesture and gesture type (move, up, down). Features：

2. LSTM + FST , FST 是一个什么样的过程 TODO

论文中的图示，y 是LSTM 的输出，其 log probablity 作为 FST 的转移概率。

3. 数据集：

Data	total words	unique words	collected from	other
Salt	14,500 13,000 1500	120	collected from 40 opt-in individuals in a wizard of oz fashion, modelled after the Pepper study	Small Data
ALK	50,000 45,000 5,000	5,450	anonymized real gestures collected from opt-in Google employees	Due to heavy preprocessing, this dataset contained roughly 70% words that had a single down and up taps and 30% which had multiples This is particularly of interest, as we find the baseline system to be unable to capture themulti-tap gestures
SYS	138,000 124,000 14,000	8,256	The third dataset wassynthetically generated from the Enron dataset [20].	by connecting the characters within a word using an algorithm that minimizes jerk, which closely fits human motor control. We also allowed for variability in the length of the sequences

4. Result :

实验结果分析：

1） consistently BLSTM 优于 ULSTM， BLSTM 在Salt这种小训练数据集上效果不好

2）对比 ALK 与 Enron 训练数据越大，效果越好，但不理解的是为什么在 Salt 上也可以达到这么好的效果？

3） BaseLine 在 clean data上效果很好，但是对于一些存在噪声的数据，如 multi-tap

4）对于 learning rate decay 的方法，有助于效果的提升，对于表现最好的模型，BLSTM-34-400*, 采用decay的方法，实验效果提升了大概 5% 在 ALK的数据集上。

【论文阅读】 输入法相关论文二 LONG SHORT TERM MEMORY NEURAL NETWORK