Phase Vocoder
Since the last project update I've been trying to get a better understanding of the phase vocoder and how to use it for automatic pitch correction. I have some basic code running.
Figure 1: The top plot is the input signal and the bottom plot is the time stretched version
Figure 2: The top plot is the input signal and the bottom plot is the time compressed version
The non-trivial aspect of these stretch and compression transformations is that the frequency is maintained in the output signal. This is a crucial intermediate step before doing pitch correction. The only remaining step for is a change in the sample rate to return it to the same time duration as the input signal.
I implemented the basic Phase Vocoder as covered in the paper "Improved Phase Vocoder Time-Scale Modification of Audio" by Jean Laroche and Mark Dolson.
My understanding of the algorithm is the following:
- Divide the input signal into frames. These frames generally are short duration (i used 512 samples). A given frames start time will overlap with the previous frame, creating a sort of signal redundancy.
- Each frame is moved to the frequency domain using the DTF/FFT.
- When generating the frequency domain representation of the output signal the magnitude of each FFT will be preserved but the phase will calculated based on the estimated frequency of the "bin" and how much time stretching/compression will be done.
- Each output frequency domain will be transformed back to time domain frames by using the IFFT/IDFT
- The output frames will be turned into a time stream by using an overlap and add method

