Kaldi学习笔记(五)——使用CVTE训练好的SR模型做中文在线识别

关于Kaldi的下载与编译请参考:http://blog.csdn.net/snowdroptulip/article/details/78896915


CVTE公司开源其训练好的TDNN模型,我们可以使用该模型来进行在线识别。


一、下载


首先从http://kaldi-asr.org/models.html下载模型;


二、解压


把下载好的模型解压到egs下面,egs/cvte


三、运行


在src/online2bin输入以下命令: 

 ./online2-wav-nnet3-latgen-faster --do-endpointing=false --online=false --feature-type=fbank --fbank-config=../../egs/cvte/s5/conf/fbank.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=../../egs/cvte/s5/exp/chain/tdnn/graph/words.txt ../../egs/cvte/s5/exp/chain/tdnn/final.mdl ../../egs/cvte/s5/exp/chain/tdnn/graph/HCLG.fst 'ark:echo utter1 utter1|' 'scp:echo utter1 ../../egs/cvte/s5/data/wav/00030/2017_03_07_16.57.22_1175.wav|' ark:/dev/null


想测自己的语音的同学,把../../egs/cvte/s5/data/wav/00030/改成自己语音文件的路径就可以了。注意WAV文件要16KHz,16bit。


四、结果


由于我是在虚拟机上运行的,运行一小会会被系统killed掉,估计是因为内存占用太高或者cpu占用太高,建议大家在配置好一点的机器上运行该模型;

Kaldi学习笔记(五)——使用CVTE训练好的SR模型做中文在线识别


五、其它说明


以下是README的一些说明,我们可以看到模型的识别效果总体还是不错的:


Copyright 2017- CVTE (http://www.cvte.com)
77;20003;0cAuthor: Yanqiang Lei
Email: [email protected]
QQ:415198468

=============================================================================================
This archive is provided by CVTE and contains the following features:
1) Acoustic model (chain,tdnn) trained with several thounsand of hours data;
2) Support online cmvn, since "apply-cmvn-online" is used during the training;
3) A 3-gram LM model is trained with 1000 GB text;
4) It is created by kaldi's master branch which is on May 02, 2017.

Files in this archive:
1) you should un-tar this inside the egs/ directory of kaldi;
2) create the soft-links, i.e., s5/steps, s5/utils, and s5/local/score.sh;
3) "conf" contains the fbank.conf used for feature extraction;
4) "data" contains ten utterances for testing;
5) "exp/chain/tdnn" includes the model;

Some results:
CVTE201701(1000 utts): ppl 340; cer: 4.55%
CVTE201703(10000 utts): ppl 313; cer: 4.5%
CVTE201705(5000 utts): ppl 200; cer: 15.7%
CVTE201705_02(7000 utts): ppl 1000+; cer: 5.58%
THCHS30(2496 utts): ppl 2000+; cer: 8.25%

Note: CVTE201705 is a very challenging test set with various noise and strong accent,
and the other CVTE sets are all recored by mobile phones or high-performance mics with standard Mandarin in office or quiet room.

=============================================================================================
How to use:
It is very easy to use these models, you can refer to "run.sh" in s5/ directory.