LPC线性预测编码模型

线性预测编码(Linear Predictive Coding, LPC)技术在数字信号处理教材里面可以看到,并不是语音信号处理才会涉及到的基础技术。这篇文章主要是回顾一下LPC的基础内容,因为在语音信号处理的相关研究方法中,基于LPC的语音信号处理技术表现出了优异的性能,尤其是针对语音去混响研究[1,2,3]。

语音是由我们的发声系统产生,该系统可以由简单的声源和声道模型来进行模拟。声源是由声带产生的,声带向声道提供激励信号,这种激励可以是周期性的或非周期性的。当声带处于发声状态(振动)时,会产生有声声音(例如,元音);而当声带处于无声状态时,会产生无声声音(例如,辅音)。声道可以看作是一个滤波器,它可以对来自声带的激励信号频谱进行整形以产生各种声音。

LPC线性预测编码模型

                                                                          图1 语音生成模型

 

图1提供了一个实用化的语音生成工程模型,LPC正是基于这个模型的语音生成技术。在该模型中,语音信号是由一个激励信号 LPC线性预测编码模型 经过一个时变的全极点滤波器产生。全极点滤波器的系数取决于所产生的特定声音的声道形状。激励信号LPC线性预测编码模型要么是浊音语音的脉冲序列,要么是无声声音的随机噪声。生成语音信号 LPC线性预测编码模型 可以表示为

LPC线性预测编码模型

其中, LPC线性预测编码模型 是滤波器的阶数, LPC线性预测编码模型 是滤波器的系数。LPC就是在已知 LPC线性预测编码模型 的情况下获取 LPC线性预测编码模型 .

 

求取 LPC线性预测编码模型 最常用的一个方法就是最小化真实信号与预测信号之间的均方误差(Mean Squared Error, MSE)。MSE函数可以表示为

LPC线性预测编码模型

然后,计算 LPC线性预测编码模型 关于每个滤波器系数的偏导,并令其值等于0,可得

LPC线性预测编码模型

通过对(3)计算,可以得到

LPC线性预测编码模型

其中, LPC线性预测编码模型 。用数值 LPC线性预测编码模型 分别替换(4)中的变量 LPC线性预测编码模型 ,我们可以得到 LPC线性预测编码模型 个关于滤波器系数的线性方程组,求解该线性方程组,即可得到滤波器系数的解。求解该方程组最常用高效的方法是 LPC线性预测编码模型 算法。



Introduction to CELP Coding

 

Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section. The CELP technique is based on three ideas:

 

  1. The use of a linear prediction (LP) model to model the vocal tract
  2. The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
  3. The search performed in closed-loop in a ``perceptually weighted domain''

This section describes the basic ideas behind CELP. This is still a work in progress.

 

Source-Filter Model of Speech Prediction

The source-filter model of speech production assumes that the vocal cords are the source of spectrally flat sound (the excitation signal), and that the vocal tract acts as a filter to spectrally shape the various sounds of speech. While still an approximation, the model is widely used in speech coding because of its simplicity.Its use is also the reason why most speech codecs (Speex included) perform badly on music signals. The different phonemes can be distinguished by their excitation (source) and spectral shape (filter). Voiced sounds (e.g. vowels) have an excitation signal that is periodic and that can be approximated by an impulse train in the time domain or by regularly-spaced harmonics in the frequency domain. On the other hand, fricatives (such as the "s", "sh" and "f" sounds) have an excitation signal that is similar to white Gaussian noise. So called voice fricatives (such as "z" and "v") have excitation signal composed of an harmonic part and a noisy part.

The source-filter model is usually tied with the use of Linear prediction. The CELP model is based on source-filter model, as can be seen from the CELP decoder illustrated in Figure 1.

 

Figure 1: The CELP model of speech synthesis (decoder)

LPC线性预测编码模型

 

 


Linear Prediction (LPC)

Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signal LPC线性预测编码模型 using a linear combination of its past samples:

 

 

LPC线性预测编码模型

 

where LPC线性预测编码模型 is the linear prediction of LPC线性预测编码模型 . The prediction error is thus given by:

 

LPC线性预测编码模型

 

The goal of the LPC analysis is to find the best prediction coefficients LPC线性预测编码模型 which minimize the quadratic error function:

 

LPC线性预测编码模型

 

That can be done by making all derivatives LPC线性预测编码模型 equal to zero:

 

LPC线性预测编码模型

 

For an order LPC线性预测编码模型 filter, the filter coefficients LPC线性预测编码模型 are found by solving the system LPC线性预测编码模型 linear system LPC线性预测编码模型 , where

 

LPC线性预测编码模型

 

 

LPC线性预测编码模型

 

with LPC线性预测编码模型 , the auto-correlation of the signal LPC线性预测编码模型 , computed as:

 

 

LPC线性预测编码模型

 

Because LPC线性预测编码模型 is toeplitz hermitian, the Levinson-Durbin algorithm can be used, making the solution to the problem LPC线性预测编码模型 instead of LPC线性预测编码模型 . Also, it can be proven that all the roots of LPC线性预测编码模型 are within the unit circle, which means that LPC线性预测编码模型 is always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiply LPC线性预测编码模型 by a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances.

 


Pitch Prediction

During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signal LPC线性预测编码模型 by a gain times the past of the excitation:

 

 

LPC线性预测编码模型

 

where LPC线性预测编码模型 is the pitch period, LPC线性预测编码模型 is the pitch gain. We call that long-term prediction since the excitation is predicted from LPC线性预测编码模型 with LPC线性预测编码模型 .

 

Innovation Codebook

The final excitation LPC线性预测编码模型 will be the sum of the pitch prediction and an innovation signal LPC线性预测编码模型 taken from a fixed codebook, hence the name Code Excited Linear Prediction. The final excitation is given by:

 

 

LPC线性预测编码模型

 

The quantization of LPC线性预测编码模型 is where most of the bits in a CELP codec are allocated. It represents the information that couldn't be obtained either from linear prediction or pitch prediction. In the z-domain we can represent the final signal LPC线性预测编码模型 as

 

LPC线性预测编码模型

 

 


Noise Weighting

Most (if not all) modern audio codecs attempt to ``shape'' the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. In order to maximize speech quality, CELP codecs minimize the mean square of the error (noise) in the perceptually weighted domain. This means that a perceptual noise weighting filter LPC线性预测编码模型 is applied to the error signal in the encoder. In most CELP codecs, LPC线性预测编码模型 is a pole-zero weighting filter derived from the linear prediction coefficients (LPC), generally using bandwidth expansion. Let the spectral envelope be represented by the synthesis filter LPC线性预测编码模型 , CELP codecs typically derive the noise weighting filter as:

 

LPC线性预测编码模型 (1)

 

 

where LPC线性预测编码模型 and LPC线性预测编码模型 in the Speex reference implementation. If a filter LPC线性预测编码模型 has (complex) poles at LPC线性预测编码模型 in the LPC线性预测编码模型 -plane, the filter LPC线性预测编码模型 will have its poles at LPC线性预测编码模型 , making it a flatter version of LPC线性预测编码模型 .

The weighting filter is applied to the error signal used to optimize the codebook search through analysis-by-synthesis (AbS). This results in a spectral shape of the noise that tends towards LPC线性预测编码模型 . While the simplicity of the model has been an important reason for the success of CELP, it remains that LPC线性预测编码模型 is a very rough approximation for the perceptually optimal noise weighting function. Fig. 2 illustrates the noise shaping that results from Eq. 1. Throughout this paper, we refer to LPC线性预测编码模型as the noise weighting filter and to LPC线性预测编码模型 as the noise shaping filter (or curve).

 

Figure 2: Standard noise shaping in CELP. Arbitrary y-axis offset.

LPC线性预测编码模型

 

 

Analysis-by-Synthesis

One of the main principles behind CELP is called Analysis-by-Synthesis (AbS), meaning that the encoding (analysis) is performed by perceptually optimising the decoded (synthesis) signal in a closed loop. In theory, the best CELP stream would be produced by trying all possible bit combinations and selecting the one that produces the best-sounding decoded signal. This is obviously not possible in practice for two reasons: the required complexity is beyond any currently available hardware and the ``best sounding'' selection criterion implies a human listener.

In order to achieve real-time encoding using limited computing resources, the CELP optimisation is broken down into smaller, more manageable, sequential searches using the perceptual weighting function described earlier.

参考文献

[1]Yoshioka T, Nakatani T, Miyoshi M. An integrated method for blind separation and dereverberation of convolutive audio mixtures[C]// Signal Processing Conference, 2008, European. IEEE, 2008:1-5.

[2]Nakatani T, Yoshioka T, Kinoshita K, et al. Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation[C]// IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008:85-88.

[3]Nakatani T, Yoshioka T, Kinoshita K, et al. Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction[J]. IEEE Transactions on Audio Speech & Language Processing, 2010, 18(7):1717-1731.