Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Linear prediction models, Essays (university) of Digital Sound Processing

LPC Methods autocorrelation lattice and covariance

Typology: Essays (university)

2017/2018

Uploaded on 04/24/2018

sibel.senturk
sibel.senturk 🇹🇷

1 document

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Speech Processing Project
Linear Predictive coding using Voice excited
Vocoder
ECE 5525
Osama Saraireh
Fall 2005
Dr. Veton Kepuska
The basic form of pitch excited LPC vocoder is shown below
The speech signal is filtered to no more than one half the system sampling
frequency and then A/D conversion is performed. The speech is processed on a frame by
frame basis where the analysis frame length can be variable. For each frame a pitch
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Linear prediction models and more Essays (university) Digital Sound Processing in PDF only on Docsity!

Speech Processing Project

Linear Predictive coding using Voice excited

Vocoder

ECE 5525

Osama Saraireh

Fall 2005

Dr. Veton Kepuska

The basic form of pitch excited LPC vocoder is shown below

The speech signal is filtered to no more than one half the system sampling

frequency and then A/D conversion is performed. The speech is processed on a frame by

frame basis where the analysis frame length can be variable. For each frame a pitch

period estimation is made along with a voicing decision. A linear predictive coefficient

analysis is performed to obtain an inverse model of the speech spectrum A (z). In addition

a gain parameter G, representing some function of the speech energy is computed. An

encoding procedure is then applied for transforming the analyzed parameters into an

efficient set of transmission parameters with the goal of minimizing the degradation in

the synthesized speech for a specified number of bits. Knowing the transmission frame

rate and the number of bits used for each transmission parameters, one can compute a

noise-free channel transmission bit rate.

At the receiver, the transmitted parameters are decoded into quantized versions of

the coeifficent analysis and pitch estimation parameters. An excitation signal for

synthesis is then constructed from the transmitted pitch and voicing parameters. The

excitation signal then drives a synthesis filter 1/A (z) corresponding to the analysis model

A (z). The digital samples s^(n) are then passed through an D/A converter and low pass

filtered to generate the synthetic speech s(t). Either before or after synthesis, the gain is

used to match the synthetic speech energy to the actual speech energy. The digital

samples are the converted to an analog signal and passed through a filter similar to the

one at the input of the system.

Linear predictive coding (LPC) of speech

The linear predictive coding (LPC) method for speech analysis and synthesis is

based on modeling the Vocal tract as a linear All-Pole (IIR) filter having the system

transfer function:

simple speech production

where m=1,2,….p

where represent the autocorrelation of the sequence defined as

the equation above can be expressed in matrix form as

where is a pxp autocorrelation matrix, is a px1 autocorrelation vector, and a is a px

vector of model parameters.

[row col] = size(data); if col==1 data=data'; end

nframe = 0; msfr = round(sr/1000fr); % Convert ms to samples msfs = round(sr/1000fs); % Convert ms to samples duration = length(data); speech = filter([1 -preemp], 1, data)'; % Preemphasize speech msoverlap = msfs - msfr; ramp = [0:1/(msoverlap-1):1]'; % Compute part of window

for frameIndex=1:msfr:duration-msfs+1 % frame rate=20ms frameData = speech(frameIndex:(frameIndex+msfs-1)); % frame size=30ms nframe = nframe+1; autoCor = xcorr(frameData); % Compute the cross correlation autoCorVec = autoCor(msfs+[0:L]);

These equations can be solved in MATLB by using the Levinson-Durbin algorithm.

% Levinson's method err(1) = autoCorVec(1); k(1) = 0; A = []; for index=1:L numerator = [1 A.']autoCorVec(index+1:-1:2); denominator = -1err(index); k(index) = numerator/denominator; % PARCOR coeffs A = [A+k(index)flipud(A); k(index)]; err(index+1) = (1-k(index)^2)err(index);

The gain parameter of the filter can be obtained by the input-output relationship as follow

where X(n) represent the input sequence.

We can further manipulate this equation and in terms of the error sequence we have

then

if the input excitation is normalized to unit energy by design, then

where G^2 is set equal to the residual energy resulting from the least square

optimization.

% filter response if 0 gain=0; cft=0:(1/255):1; for index=1:L gain = gain + aCoeff(index,nframe)exp(-i2picft).^index; end gain = abs(1./gain); spec(:,nframe) = 20log10(gain(1:128))'; plot(20log10(gain)); title(nframe); drawnow; end

if 0 impulseResponse = filter(1, aCoeff(:,nframe), [1 zeros(1,255)]); freqResp = 20*log10(abs(fft(impulseResponse))); plot(freqResp); end

once the LPC coefficients are computed, we can determine weather the input

speech frame is voiced, and if it is indeed voiced sound, then what is the pitch. We can

determine the pitch by computing the following sequence in matlab:

is used to represent it and a pitch period of T=0 is transmitted. Therefore, either white

noise or impulse train becomes the excitation of the LPC synthesis filter

Two types of LPC vocoders were implemented in MATLAB

Plain LPC Vocoder diagram is shown below :

%LPC vocoder

function [ outspeech ] = speechcoder1( inspeech )

% Parameters: % inspeech : wave data with sampling rate Fs % (Fs can be changed underneath if necessary) % Returns: % outspeech : wave data with sampling rate Fs % (coded and resynthesized)

if ( nargin ~= 1) error('argument check failed'); end;

Fs = 16000; % sampling rate in Hertz (Hz) Order = 10; % order of the model used by LPC

% encoded the speech using LPC [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

% decode/synthesize speech using LPC and impulse-trains as excitation outspeech = synlpc(aCoeff, pitch, Fs, G)

results :

residual plot :

voice excited LPC Vocoder (utilizing DCT for high compression rate/low bits)

the input speech signal in each frame is filtered with the estimated transfer function of

LPC analyzer. This filtered signal is called the residual.

resid = resid + noise;

MATLAB files : clear all;

%osama saraireh % speech processing %Dr. Veton Kepuska %FIT FAll 2005 a= input ('please load the speech signal as a .wav file ' , 's'); Inputsoundfile = a ; [inspeech, Fs, bits] = wavread(Inputsoundfile); % read the wavefile outspeech1 = speechcoder1(inspeech); % plain LPC vocoder outspeech2 = speechcoder2(inspeech); % Voice excitded LPC vocoder

% plot results figure(1); subplot(3,1,1); plot(inspeech); grid; subplot(3,1,2); plot(outspeech1); grid; subplot(3,1,3); plot(outspeech2); grid; disp('Press any key to play the original sound file'); pause; soundsc(inspeech, Fs); disp('Press any key to play the LPC compressed file!'); pause; soundsc(outspeech1, Fs); disp('Press a key to play the voice-excited LPC compressed sound!'); pause; soundsc(outspeech2, Fs);

function [aCoeff,resid,pitch,G,parcor,stream] = proclpc(data,sr,L,fr,fs,preemp)

% L - The order of the analysis.. % fr - Frame time increment, in ms. Defaults to 20ms % fs - Frame size in ms. % preemp - default 0. % aCoeff - The LPC analysis results, % resid - The LPC residual, % pitch - calculated by finding the peak in the residual's autocorrelation %for each frame. % G - The LPC gain for each frame. % parcor - The parcor coefficients. % stream - The LPC analysis' residual or excitation signal as one long vector.

if (nargin<3), L = 10; end if (nargin<4), fr = 20; end if (nargin<5), fs = 30; end if (nargin<6), preemp = .9378; end

[row col] = size(data); if col==1 data=data'; end

nframe = 0; msfr = round(sr/1000fr); % Convert ms to samples msfs = round(sr/1000fs); % Convert ms to samples duration = length(data); speech = filter([1 -preemp], 1, data)'; % Preemphasize speech msoverlap = msfs - msfr; ramp = [0:1/(msoverlap-1):1]'; % Compute part of window

for frameIndex=1:msfr:duration-msfs+1 % frame rate=20ms frameData = speech(frameIndex:(frameIndex+msfs-1)); % frame size=30ms nframe = nframe+1; autoCor = xcorr(frameData); % Compute the cross correlation autoCorVec = autoCor(msfs+[0:L]);

% Levinson's method err(1) = autoCorVec(1); k(1) = 0; A = [];

stream = [stream; resid(msfr+1:msfs,nframe)]; else overlap = resid(msfr+1:msfs,nframe).*flipud(ramp); end end stream = filter(1, [1 -preemp], stream)';

Speech Model one

LPC Vocoder :

function [ outspeech ] = speechcoder1( inspeech )

% Parameters: % inspeech : wave data with sampling rate Fs

% outputs: % outspeech : wave data with sampling rate Fs % (coded and resynthesized)

if ( nargin ~= 1) error('argument check failed'); end;

Fs = 8000; % sampling rate in Hertz (Hz) Order = 10; % order of the model used by LPC

% encoded the speech using LPC [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

% decode/synthesize speech using LPC and impulse-trains as excitation outspeech = synlpc(aCoeff, pitch, Fs, G);

% Voice-excited LPC vocoder

function [ outspeech ] = speechcoder2( inspeech ) % Parameters:

% inspeech : wave data with sampling rate Fs % (Fs can be changed underneath if necessary)

% output: % outspeech : wave data with sampling rate Fs % (coded and resynthesized)

if ( nargin ~= 1) error('argument check failed'); end;

Fs = 16000; % sampling rate in Hertz (Hz) Order = 10; % order of the model used by LPC

% encoded the speech using LPC [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

% perform a discrete cosine transform on the residual resid = dct(resid); [a,b] = size(resid); % only use the first 50 DCT-coefficients this can be done % because most of the energy of the signal is conserved in these coeffs resid = [ resid(1:50,:); zeros(430,b) ];

% quantize the data resid = uencode(resid,4); resid = udecode(resid,4);

% perform an inverse DCT resid = idct(resid);

% add some noise to the signal to make it sound better noise = [ zeros(50,b); 0.01*randn(430,b) ]; resid = resid + noise;

% decode/synthesize speech using LPC and the compressed residual as excitation outspeech = synlpc2(aCoeff, resid, Fs, G)