Description

In my final project for Parallel Computer Architecture and Programming (15418), I worked alongside Melinda Chen to write a GPU audio convolution routine. To handle long sequences of data that couldn't fit on the GPU, we used the Overlap Add algorithm and CuFFT library to partition up, transform the input to the frequency domain, compute the convolution results in the frequency domain, then transform back to the time domain. In addition, to further optimize the memory bottleneck, we used Cuda streaming in order to pipeline the computation of each partition. This way, while the GPU was computing a convolution on a sequence of data, the next sequence of data would be sending over. Thus, the bulk of the wait time between the CPU and GPU data transfer would be eliminated. Read more in our report below.

GPU streamed audio convolution

Ruslana Fogler and Melinda Chen

Description