Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Linear prediction models, Essays (university) of Digital Sound Processing

Bogazici University Digital Sound Processing

LPC Methods autocorrelation lattice and covariance

Typology: Essays (university)

2017/2018

Uploaded on 04/24/2018

sibel.senturk 🇹🇷

1 document

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Speech Processing Project

Linear Predictive coding using Voice excited

Vocoder

ECE 5525

Osama Saraireh

Fall 2005

Dr. Veton Kepuska

The basic form of pitch excited LPC vocoder is shown below

The speech signal is filtered to no more than one half the system sampling

frequency and then A/D conversion is performed. The speech is processed on a frame by

frame basis where the analysis frame length can be variable. For each frame a pitch

Partial preview of the text

Download Linear prediction models and more Essays (university) Digital Sound Processing in PDF only on Docsity!

Speech Processing Project

Linear Predictive coding using Voice excited

Vocoder

ECE 5525

Osama Saraireh

Fall 2005

Dr. Veton Kepuska

The basic form of pitch excited LPC vocoder is shown below

The speech signal is filtered to no more than one half the system sampling

frequency and then A/D conversion is performed. The speech is processed on a frame by

frame basis where the analysis frame length can be variable. For each frame a pitch

period estimation is made along with a voicing decision. A linear predictive coefficient

analysis is performed to obtain an inverse model of the speech spectrum A (z). In addition

a gain parameter G, representing some function of the speech energy is computed. An

encoding procedure is then applied for transforming the analyzed parameters into an

efficient set of transmission parameters with the goal of minimizing the degradation in

the synthesized speech for a specified number of bits. Knowing the transmission frame

rate and the number of bits used for each transmission parameters, one can compute a

noise-free channel transmission bit rate.

At the receiver, the transmitted parameters are decoded into quantized versions of

the coeifficent analysis and pitch estimation parameters. An excitation signal for

synthesis is then constructed from the transmitted pitch and voicing parameters. The

excitation signal then drives a synthesis filter 1/A (z) corresponding to the analysis model

A (z). The digital samples s^(n) are then passed through an D/A converter and low pass

filtered to generate the synthetic speech s(t). Either before or after synthesis, the gain is

used to match the synthetic speech energy to the actual speech energy. The digital

samples are the converted to an analog signal and passed through a filter similar to the

one at the input of the system.

Linear predictive coding (LPC) of speech

The linear predictive coding (LPC) method for speech analysis and synthesis is

based on modeling the Vocal tract as a linear All-Pole (IIR) filter having the system

transfer function:

simple speech production

where m=1,2,….p

where represent the autocorrelation of the sequence defined as

the equation above can be expressed in matrix form as

where is a pxp autocorrelation matrix, is a px1 autocorrelation vector, and a is a px

vector of model parameters.

[row col] = size(data); if col==1 data=data'; end

nframe = 0; msfr = round(sr/1000fr); % Convert ms to samples msfs = round(sr/1000fs); % Convert ms to samples duration = length(data); speech = filter([1 -preemp], 1, data)'; % Preemphasize speech msoverlap = msfs - msfr; ramp = [0:1/(msoverlap-1):1]'; % Compute part of window

for frameIndex=1:msfr:duration-msfs+1 % frame rate=20ms frameData = speech(frameIndex:(frameIndex+msfs-1)); % frame size=30ms nframe = nframe+1; autoCor = xcorr(frameData); % Compute the cross correlation autoCorVec = autoCor(msfs+[0:L]);

These equations can be solved in MATLB by using the Levinson-Durbin algorithm.

% Levinson's method err(1) = autoCorVec(1); k(1) = 0; A = []; for index=1:L numerator = [1 A.']autoCorVec(index+1:-1:2); denominator = -1err(index); k(index) = numerator/denominator; % PARCOR coeffs A = [A+k(index)flipud(A); k(index)]; err(index+1) = (1-k(index)^2)err(index);

The gain parameter of the filter can be obtained by the input-output relationship as follow

where X(n) represent the input sequence.

We can further manipulate this equation and in terms of the error sequence we have

then

if the input excitation is normalized to unit energy by design, then

where G^2 is set equal to the residual energy resulting from the least square

optimization.

% filter response if 0 gain=0; cft=0:(1/255):1; for index=1:L gain = gain + aCoeff(index,nframe)exp(-i2picft).^index; end gain = abs(1./gain); spec(:,nframe) = 20log10(gain(1:128))'; plot(20log10(gain)); title(nframe); drawnow; end

if 0 impulseResponse = filter(1, aCoeff(:,nframe), [1 zeros(1,255)]); freqResp = 20*log10(abs(fft(impulseResponse))); plot(freqResp); end

once the LPC coefficients are computed, we can determine weather the input

speech frame is voiced, and if it is indeed voiced sound, then what is the pitch. We can

determine the pitch by computing the following sequence in matlab:

is used to represent it and a pitch period of T=0 is transmitted. Therefore, either white

noise or impulse train becomes the excitation of the LPC synthesis filter

Two types of LPC vocoders were implemented in MATLAB

Plain LPC Vocoder diagram is shown below :

%LPC vocoder

function [ outspeech ] = speechcoder1( inspeech )

% Parameters: % inspeech : wave data with sampling rate Fs % (Fs can be changed underneath if necessary) % Returns: % outspeech : wave data with sampling rate Fs % (coded and resynthesized)

if ( nargin ~= 1) error('argument check failed'); end;

Fs = 16000; % sampling rate in Hertz (Hz) Order = 10; % order of the model used by LPC

% encoded the speech using LPC [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

% decode/synthesize speech using LPC and impulse-trains as excitation outspeech = synlpc(aCoeff, pitch, Fs, G)

results :

residual plot :

voice excited LPC Vocoder (utilizing DCT for high compression rate/low bits)

the input speech signal in each frame is filtered with the estimated transfer function of

LPC analyzer. This filtered signal is called the residual.

resid = resid + noise;

MATLAB files : clear all;

%osama saraireh % speech processing %Dr. Veton Kepuska %FIT FAll 2005 a= input ('please load the speech signal as a .wav file ' , 's'); Inputsoundfile = a ; [inspeech, Fs, bits] = wavread(Inputsoundfile); % read the wavefile outspeech1 = speechcoder1(inspeech); % plain LPC vocoder outspeech2 = speechcoder2(inspeech); % Voice excitded LPC vocoder

% plot results figure(1); subplot(3,1,1); plot(inspeech); grid; subplot(3,1,2); plot(outspeech1); grid; subplot(3,1,3); plot(outspeech2); grid; disp('Press any key to play the original sound file'); pause; soundsc(inspeech, Fs); disp('Press any key to play the LPC compressed file!'); pause; soundsc(outspeech1, Fs); disp('Press a key to play the voice-excited LPC compressed sound!'); pause; soundsc(outspeech2, Fs);

function [aCoeff,resid,pitch,G,parcor,stream] = proclpc(data,sr,L,fr,fs,preemp)

% L - The order of the analysis.. % fr - Frame time increment, in ms. Defaults to 20ms % fs - Frame size in ms. % preemp - default 0. % aCoeff - The LPC analysis results, % resid - The LPC residual, % pitch - calculated by finding the peak in the residual's autocorrelation %for each frame. % G - The LPC gain for each frame. % parcor - The parcor coefficients. % stream - The LPC analysis' residual or excitation signal as one long vector.

if (nargin<3), L = 10; end if (nargin<4), fr = 20; end if (nargin<5), fs = 30; end if (nargin<6), preemp = .9378; end

[row col] = size(data); if col==1 data=data'; end

% Levinson's method err(1) = autoCorVec(1); k(1) = 0; A = [];

stream = [stream; resid(msfr+1:msfs,nframe)]; else overlap = resid(msfr+1:msfs,nframe).*flipud(ramp); end end stream = filter(1, [1 -preemp], stream)';

Speech Model one

LPC Vocoder :

function [ outspeech ] = speechcoder1( inspeech )

% Parameters: % inspeech : wave data with sampling rate Fs

% outputs: % outspeech : wave data with sampling rate Fs % (coded and resynthesized)

if ( nargin ~= 1) error('argument check failed'); end;

Fs = 8000; % sampling rate in Hertz (Hz) Order = 10; % order of the model used by LPC

% encoded the speech using LPC [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

% decode/synthesize speech using LPC and impulse-trains as excitation outspeech = synlpc(aCoeff, pitch, Fs, G);

% Voice-excited LPC vocoder

function [ outspeech ] = speechcoder2( inspeech ) % Parameters:

% inspeech : wave data with sampling rate Fs % (Fs can be changed underneath if necessary)

% output: % outspeech : wave data with sampling rate Fs % (coded and resynthesized)

if ( nargin ~= 1) error('argument check failed'); end;

Fs = 16000; % sampling rate in Hertz (Hz) Order = 10; % order of the model used by LPC

% encoded the speech using LPC [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

% perform a discrete cosine transform on the residual resid = dct(resid); [a,b] = size(resid); % only use the first 50 DCT-coefficients this can be done % because most of the energy of the signal is conserved in these coeffs resid = [ resid(1:50,:); zeros(430,b) ];

% quantize the data resid = uencode(resid,4); resid = udecode(resid,4);

% perform an inverse DCT resid = idct(resid);

% add some noise to the signal to make it sound better noise = [ zeros(50,b); 0.01*randn(430,b) ]; resid = resid + noise;

% decode/synthesize speech using LPC and the compressed residual as excitation outspeech = synlpc2(aCoeff, resid, Fs, G)

Linear prediction models, Essays (university) of Digital Sound Processing

Related documents

Partial preview of the text

Download Linear prediction models and more Essays (university) Digital Sound Processing in PDF only on Docsity!

Speech Processing Project

Linear Predictive coding using Voice excited

Vocoder

ECE 5525

Osama Saraireh

Fall 2005

Dr. Veton Kepuska

The basic form of pitch excited LPC vocoder is shown below

The speech signal is filtered to no more than one half the system sampling

frequency and then A/D conversion is performed. The speech is processed on a frame by

frame basis where the analysis frame length can be variable. For each frame a pitch

period estimation is made along with a voicing decision. A linear predictive coefficient

analysis is performed to obtain an inverse model of the speech spectrum A (z). In addition

a gain parameter G, representing some function of the speech energy is computed. An

encoding procedure is then applied for transforming the analyzed parameters into an

efficient set of transmission parameters with the goal of minimizing the degradation in

the synthesized speech for a specified number of bits. Knowing the transmission frame

rate and the number of bits used for each transmission parameters, one can compute a

noise-free channel transmission bit rate.

At the receiver, the transmitted parameters are decoded into quantized versions of

the coeifficent analysis and pitch estimation parameters. An excitation signal for

synthesis is then constructed from the transmitted pitch and voicing parameters. The

excitation signal then drives a synthesis filter 1/A (z) corresponding to the analysis model

A (z). The digital samples s^(n) are then passed through an D/A converter and low pass

filtered to generate the synthetic speech s(t). Either before or after synthesis, the gain is

used to match the synthetic speech energy to the actual speech energy. The digital

samples are the converted to an analog signal and passed through a filter similar to the

one at the input of the system.

Linear predictive coding (LPC) of speech

The linear predictive coding (LPC) method for speech analysis and synthesis is

based on modeling the Vocal tract as a linear All-Pole (IIR) filter having the system

transfer function:

where m=1,2,….p

where represent the autocorrelation of the sequence defined as

the equation above can be expressed in matrix form as

where is a pxp autocorrelation matrix, is a px1 autocorrelation vector, and a is a px

vector of model parameters.

These equations can be solved in MATLB by using the Levinson-Durbin algorithm.

The gain parameter of the filter can be obtained by the input-output relationship as follow

where X(n) represent the input sequence.

We can further manipulate this equation and in terms of the error sequence we have

then

if the input excitation is normalized to unit energy by design, then

where G^2 is set equal to the residual energy resulting from the least square

optimization.

once the LPC coefficients are computed, we can determine weather the input

speech frame is voiced, and if it is indeed voiced sound, then what is the pitch. We can

determine the pitch by computing the following sequence in matlab:

is used to represent it and a pitch period of T=0 is transmitted. Therefore, either white

noise or impulse train becomes the excitation of the LPC synthesis filter

Two types of LPC vocoders were implemented in MATLAB

Plain LPC Vocoder diagram is shown below :

%LPC vocoder

residual plot :

voice excited LPC Vocoder (utilizing DCT for high compression rate/low bits)

the input speech signal in each frame is filtered with the estimated transfer function of

LPC analyzer. This filtered signal is called the residual.

resid = resid + noise;