Anda di halaman 1dari 18

ELEC 484 Project Time Segment Processing

Darko Stelkic / Mike Blarowski University of Victoria, BC, Canada

In this report we will discuss several time domain algorithms used to produce effects of time stretching and pitch manipulation of audio signals. We will introduce basic audio effects such as variable speed replay and pitch controlled resampling which are based on delay line modulation and amplitude modulation. Next, we will explain two approaches for changing duration of audio signals (time stretching). In the last section of this report we will discuss three different techniques of pitch shifting: block processing based on time stretching and resampling, delay line modulation and pitch-synchronous block processing.

0 Introduction
Time Segment processing plays an important role in the field of digital audio effects. In this paper we explore various methods of time stretching and pitch shifting, implementing each of these ideas in MatLab. The results of each effect, is illustrated in the attached graphs and most importantly in the accompanying sound files.

Fig. 1: Variable speed replay leading to time and spectral envelope compression / expansion. [DAFX] The above described method was implemented in MatLab using the following code. Matlab Code: VariableSpeed.m

1 Variable Speed Replay (7.2 DAFX)

Similar to the old audio tape recorders which can play audio over a wide range of speeds, variable speed replay allows for a faster or slower playback. During this procedure, however the pitch of the audio is changed. The variable speed replay was accomplished by modifying the sampling frequency while writing the output file according to the following equation; , v<1 is used for time expansion and v>1 is used for time compression. Figure 1 below illustrates the time stretching effect in both time and frequency domains. f s , replay = f s ,in * v

The following audio files demonstrate the results of variable speed replay. Original: Faster: Slower: Original: Slower: Faster: Bass_skit.wav VSR_Bass_skit_faster.wav VSR_Bass_skit_slower.wav VSR_fretless1.wav VSR_fretless1_slower.wav VSR_fretless1_faster.wav

2 Time Stretching (7.3 DAFX)

A more desirable way of manipulating playback time would be to do so without changing the pitch. This effect can be performed using many different techniques, some of which are described here.

2.1 Synchronous Overlap and Add (SOLA) (7.3.2 DAFX)

This simple technique of time stretching based on correlation is proposed in [RW85, MEJ86] This Synchronous Overlap and Add Algorithm performed in the following steps: 1. The input signal is divided into overlapping segments of fixed length as shown in Fig. 2. 2. The overlapping segments are shifted according to a desired time scaling factor,

Page 1 of 18

3. The area of overlap intervals are searched for a discrete-time lag of maximum similarity. At the point of maximum similarity, the overlapping segments are weighted by a fade in or fade out function to eliminate abrupt changes. The segments add together for an audio sample of changed time length.

transition between the segments (fade in, fade out).

Fig. 3: PSOLA pitch analysis [DAFX]

Fig. 2: Sola Time Stretching [DAFX] The above described algorithm was implemented in MatLab using TimeScaleSOLA.m file (DAFX, p.209-211). Matlab Code: TimeScaleSOLA.m

B. Synthesis:
1. 2. Choice of the corresponding analysis segment, identified by the time mark. Overlap and add the selected segment. At this point it is decided if the signal is going to be shrunk or stretched based on the scaling factor. If the scaling factor is less than 1, some segments will be discarded (time compression) and if the factor is more than 1, some segments will be repeated (time expansion). Determination of the time instant where the next synthesis will be centered in order to preserve pitch.

The following audio files demonstrate the time stretching by SOLA technique. Original: brass_jazz.wav Shrink: SOLA_brass_jazz_shrink.wav Stretch: SOLA_brass_jazz_stretch.wav


2.2 Pitch-synchronous Overlap and Add (PSOLA) (7.3.3 DAFX)

A variation of SOLA algorithm is the PSOLA technique developed by Moulines at al. [HMC89, MC90] which is especially useful for voice processing. It is based on the assumption that the input sound is characterized by a pitch. Pitchsynchronous Overlap and Add (PSOLA) consists of two phases:

A. Analysis:
1. Determination of the pitch period. Signal is divided into small blocks for which the pitch is considered constant. Pitch detection for each block is performed. Extraction of a segment (block) centered over each pitch mark using a Hanning window [BJ95-a] with the length of two pitch periods to allow for a smooth

Fig. 4: PSOLA Synthesis [DAFX] The pitch detection itself was fairly difficult task to accomplish. As demonstrated in Figure 5 below, our pitch detector worked pretty well on some of the sound files. We have run into problems while processing files with broad range of vocals and


Page 2 of 18

instruments. More than likely this was caused by numerous fundamental frequencies present in those sound files.

of time stretching needs to be applied in addition to variable speed replay. Pitch shifting followed by time stretching algorithm is illustrated in the following figure.

Fig. 5: output of the pitch marker program There are also limitations on the stretching factors which have a limited range (0.25 to 2) for speech and sound experiences some business due to regular repetition of identical input signals. The PSOLA algorithm was implemented in MatLab as per psola.m file (DAFX, p.213-214). It requires the pitch marks which were obtained using a pitch marker function written in MatLab. The following code was used: Pitch Marker: Psola: TimeStretchPsola: PitchMarker.m psola.m TimeStretchPSOLA.m

Fig. 6: Pitch shifting & time correction [DAFX]

Fig. 7: Pitch shifting by time scaling and resampling [DAFX] Pitch Shifting by Time Stretching and Re-sampling was implemented using the psola.m function (DAFX, p.213-214) and the below included MatLab code. Psola: psola.m Pitch Shifting by Time Stretch and re-sample: PitchShiftingByTimeStretchingResampling.m The following audio files demonstrate the Pitch Shifting by Time Stretching and Re-sampling technique. Original: Higher: Lower: brass_jazz.wav pitch_brass_jazz_higher.wav pitch_brass_jazz_lower.wav

The following audio files demonstrate the time stretching by PSOLA technique. Original: Shrink: Stretch: brass_jazz.wav PSOLA_brass_jazz-shrink.wav PSOLA_brass_jazz-stretch.wav

3 Pitch Shifting (7.4 DAFX)

The aim of pitch shifting is to change the original file frequencies and preserve all the harmonic ratios, so that in the end the processed file has a different pitch.

3.2 Pitch Shifting by Delay Line Modulation (7.4.3 DAFX)

This technique was presented in several publications. In the Using Multiple Processors for real-time audio effects (1989) K. Bogdanowicz and R. Blecher [BB89] proposed a method of pitch shifting based on an overlap add scheme with two time varying delay lines. Outputs of the two delay lines are combined in a cross fade block according to a cross fade function. The signal is divided in small blocks and these blocks

3.1 Pitch Shifting by Time Stretching and Re-sampling (7.4.2 DAFX)

This technique continues on the variable speed replay technique. In variable speed replay, the pitch of the signal is changed, but the time of the sample is also altered. Since we want to change the pitch and preserve the original time length a further algorithm

Page 3 of 18

are read faster or slower to produce higher or lower pitches. Blocks are read simultaneously in sets of two with a time delay of 1.5 of the block length in order to produce a continuous usable output.

Fig. 10: Plot of original signal (blue) and the pitch shifted signal (red) As it can be seen from the following figure, signal with a higher pitch is raised compared to the original signal and the harmonics are spaced further apart. Similarly, lower pitch signal shows lower frequency content with the harmonics closer together.

Fig. 8: Pitch shifting by delay line [DAFX] In order to control delay line modulation a saw tooth type function is used. Similar approach is proposed in [Dat87]. A more advanced method is presented in [DZ99] where and overlap-add scheme was proposed. This method does not need any fundamental frequency estimation. Instead of overlapping two segments, this method uses three parallel time varying delay lines all overlapping each other. Blocks overlap 2/3 of the block length. The following figure illustrates this method. Fig. 11: Spectrogram of original signal (top), higher pitch (middle), and lower pitch (bottom) The above described Algorithm was implemented using the vibrato.m file (DAFX, p.68-69) as the starting point. The file was modified to use a sawtooth function instead of a sine wave. Delayline: Delayline.m PitchShiftDelayLineModulation: PitchShiftDelayLineModulation.m The following audio files demonstrate the delay line modulation technique. Original: Higher: Lower: x1.wav x1DelayLine_HIGH40.wav x1DelayLine_LOW40.wav

Fig. 9: Pitch shifting by overlap-add scheme [DAFX] Following figure represents the original signal vs. the pitch shifted signal.

3.3 Pitch Shifting by PSOLA and

Format Preservation (7.4.4 DAFX)

This technique is similar to operation of resampling in time domain. The difference is that now resampling of the short time spectral envelope is performed. This spectral envelope is a line going through all the amplitudes of the harmonics. This can be seen in the following figure.

Page 4 of 18

Fig. 12: Pitch shifting by PSOLA method: frequency resampling the spectral envelope [DAFX] The harmonics are scaled according to the scaling factor, but the amplitudes are determined by sampling the spectral envelope. PSOLA algorithm can therefore be used for pitch shifting a voice signal and preserving its format. By preserving the format of the signal we are effectively preserving the voice identity. [ML95, BJ95] As it can be seen from the following figures, PSOLA analysis for pitch shifting is identical to the analysis for time stretching. The difference is apparent in the synthesis part where instead of just simply adding or removing segments and therefore stretching the time, we are now adding or removing segments by overlapping windows and therefore preserving the duration of the signal while changing its pitch.

Fig. 14: PSOLA Synthesis for pitch shifting [DAFX] The above described Algorithm was implemented using the psolaF.m function (DAFX, p. 225) Psola_Format: psola_format.m PsolaF: psolaF.m The following audio files demonstrate the results of

Pitch Shifting by PSOLA and Format Preservation.

Original: PSOLA Format: Higher: Lower: brass_jazz.wav brass_jazz_PSOLA.wav brass_jazz_PSOLA_high.wav brass_jazz_PSOLA_low.wav

4 Conclusions
Effects used for time stretching and pitch shifting described in this report are all based on using small segments of signal which are then processed using methods like time scaling by resampling or amplitude multiplication by an envelope. Main point is that the waveform of each segment is not changed which is the key to preserving the characteristics of the source signal. Applying these effects to actual audio samples, we were able to confirm the theory behind all of these algorithms. Methods described in this report offer a basic tool for time and pitch manipulation, and due to their low computational complexity, are efficient tools for real time signal processing. However, quality of the produced results of these algorithms limits their scope of application. More advanced methods of for time stretching and pitch shifting are available where higher quality of the final product is desired.

Fig. 13: PSOLA Analysis for pitch shifting (same as for time stretching)

Page 5 of 18

Time Segment Processing

5 References
[BB89] K. Bogdanowicz and R. Blecher. Using Multiple Processors for real-time audio effects. In AES 7th International Conference, pp. 337-342, 1989. BB89.pdf [BJ95] R. Bristow-Johnson. A detailed analysis of a time-domain format-corrected pitch shifting algorithm. J. Audio Eng. Soc., 43(5):340-352, 1995. BJ95.pdf [BJ95-a] R. Bristow-Johnson. A detailed analysis of a time-domain format-corrected pitch shifting algorithm. J. Audio Eng. Soc., 43(5):347, 1995. BJ95.pdf [DAFX] U. Zolzer. Digital Audio Effects. John Wiley and Sons, pp. 202-225, 2005. [Dat87] J. Dattorro. Using Digital Signal Processor Chips in a Stereo Audio Time Compressor / Expander. In Proc. 83rd AES Convention, Preprint 2500, 1987. dat87.pdf [DZ99] S. Disch and U. Zolzer. Modulation and delay line based digital audio effects. In Proc. DAFX-99 Digital Audio Effects Workshop, pp.5-8, Trondheim, December 1999. [HMC89] C. Hamon, E. Moulines and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In Proc. ICASSP, pp.238-241, 1989. [MC90] E. Moulines and F. Charpentier. Pitch synchronous waveform processing technique for textto speech synthesis using diphones. Speech Communication, 16:175-205, 1995. mc90.doc [MEJ86] J. Makhoul and A. El-Jaroudi. Time-scale modification in medium to low rate speech coding. In Proc. ICASSP, pp.1705-1708, 1986. [ML95] E. Moulines and J. Laroche. Non-parameter technique for pitch-scale and time-scale modification of speech. Speech Communication, 9(5/6):453-467, 1990. [RW85] S. Roucos and A.M. Wilgus. High quality time-scale modification for speech. In Proc. ICASSP, pp. 493-496, 1985.

Page 6 of 18

Time Segment Processing

Appendix A MatLab code

%Mike Blarowski %Darko Stelkic % (7.2) Variable Speed Replay [x,f_s,nbits] = wavread('Bass_skit.wav'); v_f = 1.5; v_s = 0.7; %time stretch %time expansion

wavwrite(x, f_s*v_f, '7_2_VSR_Bass_skit_faster'); wavwrite(x, f_s*v_s, '7_2_VSR_Bass_skit_slower');

Page 7 of 18

Time Segment Processing

% % % % % % % % %


Time Scaling with Synchronized Overlap and Add

analysis hop size block length time scaling factor overlap interval

Sa = N = 0.25 L =

5000 10000 <= alpha <= 2, 0.8 - stretch, 1.2 - shrink 10

clear all,close all [signal,Fs] DAFx_in = = wavread('brass_jazz.wav'); signal'; = '); = ');

Sa = input('Analysis hop size Sa in samples N = input('Analysis block size N in samples if Sa > N disp('Sa must be less than N !!!') end M = ceil(length(DAFx_in)/Sa); % Segmentation into blocks of length N every Sa samples % leads to M segments alpha =input('Time stretching factor alpha Ss =round(Sa*alpha); L =input('Overlap in samples (even) = ');

= ');

if Ss >= N disp('alpha is not correct, Ss is >= N') elseif Ss > N-L disp('alpha is not correct, Ss is > N-L') end DAFx_in(M*Sa+N)=0; Overlap = DAFx_in(1:N); % **** Main TimeScaleSOLA loop **** for ni=1:M-1 grain=DAFx_in(ni*Sa+1:N+ni*Sa); XCORRsegment=xcorr(grain(1:L),Overlap(1,ni*Ss:ni*Ss+(L-1))); [xmax(1,ni),index(1,ni)]=max(XCORRsegment); fadeout=1:(-1/(length(Overlap)-(ni*Ss-(L-1)+index(1,ni)-1))):0; fadein=0:(1/(length(Overlap)-(ni*Ss-(L-1)+index(1,ni)-1))):1; Tail=Overlap(1,(ni*Ss-(L-1))+ ... index(1,ni)-1:length(Overlap)).*fadeout; Begin=grain(1:length(fadein)).*fadein; Add=Tail+Begin; Overlap=[Overlap(1,1:ni*Ss-L+index(1,ni)-1) ... Add grain(length(fadein)+1:N)]; end; % **** end TimeScaleSOLA loop **** % Output in WAV file sound(Overlap,44100); wavwrite(Overlap,Fs,'7_3_2_SOLA_brass_jazz_shrink.wav');

Page 8 of 18

Time Segment Processing % PitchMarker.m

% Finds all the pitch marks in the input file and returns the % markings in a matrix function [ pitch ] = PitchMarker(section) %clear all %close all %[x,fs,bit]=wavread('brass_jazz.wav'); %section=x; % initial settings blocksize=400; mark=[1:length(section)]*0; last_pos=1; place=1; blocksize=300; i=1; while last_pos+floor(blocksize*1.7) < length(section) % grabs the next block to examine temp=section(last_pos+50:last_pos+floor(blocksize*1.7)); % finds the high point in the block [mag,place]=max(temp); % checks to see if there is really a signal in this block if mag < 0.01 place=length(temp); mode = 0; mark(place+last_pos+50)=1; pitch(i)=place+last_pos+50; else mode = 1; end % checks to see if there is a pitch mark before the current pitch mark while mode == 1 % finds largest point in block from beginning to current pitch mark [mag2,place2]=max(temp(1:place-50)); % checks to see if high mark is suffincent size to be a pitch mark if mag2 > 0.90*mag mag=mag2; place=place2; else mode = 0; mark(place+last_pos+50)=1; pitch(i)=place+last_pos+50; end end % starts the next block to be examind 50 samples after this block blocksize=place+50;

Page 9 of 18

Time Segment Processing

% makes sure next blocksize is of sufficent size if blocksize < 150 blocksize=150; end last_pos=place+last_pos+50; i=i+1;

end % % % %

plot(mark) hold on plot(section,'r') hold off

Page 10 of 18

Time Segment Processing %psola.m

function out=psola(in,m,alpha,beta) % in input signal % m pitch marks (from PitchMarker.m function) % alpha time stretching factor % beta pitch shifting factor P = diff(m); %compute pitch periods

if m(1)<=P(1), %remove first pitch mark m=m(2:length(m)); P=P(2:length(P)); end if m(length(m))+P(length(P))>length(in) %remove last pitch mark m=m(1:length(m)-1); else P=[P P(length(P))]; end Lout=ceil(length(in)*alpha); out=zeros(1,Lout); %output signal tk = P(1)+1; %output pitch mark

while round(tk)<Lout [minimum i] = min( abs(alpha*m - tk) ); %find analysis segment pit=P(i); st=m(i)-pit; en=m(i)+pit; gr = in(st:en) .* hanning(2*pit+1); iniGr=round(tk)-pit; endGr=round(tk)+pit; if endGr>Lout, break; end out(iniGr:endGr) = out(iniGr:endGr)+gr'; %overlap new segment tk=tk+pit/beta; end %while

Page 11 of 18

Time Segment Processing % time_stretch_PSOLA.m

%Mike Blarowski %Darko Stelkic % (7.3.3) Time stetching using Pitch Synchronous Overlap and Add (PSOLA) clear all close all [x,f_s,nbits]=wavread('brass_jazz.wav'); y=zeros(1,length(x)); m=PitchMarker(x); alpha=1.6; beta=1; y=psola(x,m,alpha,beta); wavwrite(y, f_s, '7_3_3_PSOLA_brass_jazz-stretch.wav'); alpha=0.6; beta=1; y=psola(x,m,alpha,beta); wavwrite(y, f_s, '7_3_3_PSOLA_brass_jazz-shrink.wav');

Page 12 of 18

Time Segment Processing % Pitch Shifting by Time Stretching and Resampling (7.4.2)
[x,f_s,nbits]=wavread('brass_jazz.wav'); y=zeros(1,length(x)); m=pitch_detector(x); alpha=1.5; beta=1; y=psola(x,m,alpha,beta); y=resample(y,length(x),length(y)); wavwrite(y, f_s, '7_4_2_brass_jazz-high.wav'); alpha=0.75; beta=1; y=psola(x,m,alpha,beta); y=resample(y,length(x),length(y)); wavwrite(y, f_s, '7_4_2_brass_jazz-low.wav');

Page 13 of 18

Time Segment Processing % Delayline.m (based on vibrato.m)

function y=Delayline(x,SAMPLERATE,Modfreq,Width,BLen) ya_alt=0; Delay=Width; % basic delay of input sample in sec DELAY=round(Delay*SAMPLERATE); % basic delay in # samples WIDTH=round(Width*SAMPLERATE); % modulation width in # samples if WIDTH>DELAY error('delay greater than basic delay !!!'); end MODFREQ=Modfreq/SAMPLERATE; % modulation frequency in # samples LEN=length(x); % # of samples in WAV-file L=2+DELAY+WIDTH*2; % length of the entire delay Delayline=zeros(L,1); % memory allocation for delay y=zeros(size(x)); % memory allocation for output vector j=1; for n=1:(LEN-1) %M=MODFREQ; if j>BLen j=1; end MOD=j*Modfreq/SAMPLERATE; ZEIGER=1+DELAY+WIDTH*MOD; i=floor(ZEIGER); frac=ZEIGER-i; Delayline=[x(n);Delayline(1:L-1)]; %---Linear Interpolation----------------------------y(n,1)=Delayline(i)*frac+Delayline(i-1)*(1-frac); %---Allpass Interpolation-----------------------------%y(n,1)=(Delayline(i+1)+(1-frac)*Delayline(i)-(1-frac)*ya_alt); %ya_alt=y(n,1); %---Spline Interpolation------------------------------%y(n,1)=Delayline(i+1)*frac^3/6 %....+Delayline(i)*((1+frac)^3-4*frac^3)/6 %....+Delayline(i-1)*((2-frac)^3-4*(1-frac)^3)/6 %....+Delayline(i-2)*(1-frac)^3/6; %3rd-order Spline Interpolation j=j+1; end

Page 14 of 18

Time Segment Processing % Pitch shift delay line modulation

close all clear all BLen = 1024; %block length freq_sawtooth=-30; %pitch change variable (negative rases pitch) Width=0.01; [x,fs,bit]=wavread('Brass_Jazz.wav'); % creating the window ind = (1:length(x))'*2*pi/BLen; Wa =(1-cos(ind))/2; %plot(w) % makes 2 copies of input file An=x(1:length(x)); Bn=x(1+BLen/2:length(x)); % applies varring delay An_y=Delayline(An,fs,freq_sawtooth,Width,BLen); Bn_y=Delayline(Bn,fs,freq_sawtooth,Width,BLen); % applies windowing to modified input An_Win=An_y.*Wa(1:length(An_y)); Bn_Win=Bn_y.*Wa(1:length(Bn_y)); % stores modifed input in buffer y_a=An_Win; y_b=[1,length(x)]*0; y_b(1+BLen/2:length(x))=Bn_Win; % combines the two buffers y=y_a+y_b'; %wavwrite(y,fs,'7_4_3_DelayLine_HIGH.wav'); %wavwrite(y,fs,'7_4_3_DelayLine_LOW.wav'); % % % % % figure plot(abs(fft(x)),'b') hold plot(abs(fft(y)),'r') hold off

figure (2) plot(x,'b') hold plot(y,'r') hold off % figure % plot(y_a) % figure

Page 15 of 18

Time Segment Processing

% plot(y_b)

Page 16 of 18

Time Segment Processing % Pitch Shifting by PSOLA and Formant Preservation (7.4.4)
clear all close all [x,f_s,nbits]=wavread('brass_jazz.wav'); y=zeros(1,length(x)); alpha=1; beta=1; gamma=2; m=PitchMarker(x); y=psolaF(x,m,alpha,beta,gamma); wavwrite(y, f_s, '7_4_4_brass_jazz_psola_formant.wav'); alpha=1; beta=1.5; gamma=1; y=psolaF(x,m,alpha,beta,gamma); wavwrite(y, f_s, '7_4_4_brass_jazz_psola_formant_high.wav'); alpha=1; beta=0.75; gamma=1; y=psolaF(x,m,alpha,beta,gamma); wavwrite(y, f_s, '7_4_4_brass_jazz_psola_formant_low.wav');

Page 17 of 18

Time Segment Processing %psolaF.m

function out=psolaF(in,m,alpha,beta,gamma) % . . . % gamma newFormantFreq/oldFormantFreq % . . . % the internal loop as P = diff(m); %compute pitch periods if m(1)<=P(1), %remove first pitch mark m=m(2:length(m)); P=P(2:length(P)); end if m(length(m))+P(length(P))>length(in) %remove last pitch mark m=m(1:length(m)-1); else P=[P P(length(P))]; end Lout=ceil(length(in)*alpha); out=zeros(1,Lout); %output signal tk = P(1)+1; %output pitch mark while round(tk)<Lout [minimum i]=min(abs(alpha*m-tk) ); % find analysis segment pit=P(i); pitStr=floor(pit/gamma); gr=in(m(i)-pit:m(i)+pit).*hanning(2*pit+1); gr=interp1(-pit:1:pit,gr,-pitStr*gamma:gamma:pit);% stretch segm. iniGr=round(tk)-pitStr;endGr=round(tk)+pitStr; if endGr>Lout, break; end out(iniGr:endGr)=out(iniGr:endGr)+gr; % overlap new segment tk=tk+pit/beta; end % end of while

Page 18 of 18