在Python上实现语谱图

最近打算彻底从Matlab转移到Python,遇到了个问题就是如何将语音转换成语谱图。

代码如下:

import numpy, wave

# filename 是文件名
# window_length_ms 是以毫秒为单位的窗长
# window_shift_times 是帧移,是与窗长的比例 例如窗长20ms,帧移0.5就是10毫秒

def getSpectrum(filename, window_length_ms, window_shift_times):  
    # 读音频文件
    wav_file = wave.open(filename, 'r')

    # 获取音频文件的各种参数
    params = wav_file.getparams()
    nchannels, sampwidth, framerate, wav_length = params[:4]

    # 获取音频文件内的数据,不知道为啥获取到的竟然是个字符串,还需要在numpy中转换成short类型的数据
    str_data = wav_file.readframes(wav_length)
    wave_data = numpy.fromstring(str_data, dtype=numpy.short)

    # 将窗长从毫秒转换为点数
    window_length = framerate * window_length_ms / 1000
    window_shift = int(window_length * window_shift_times)

    # 计算总帧数,并创建一个空矩阵
    nframe = (wav_length - (window_length - window_shift)) / window_shift
    spec = numpy.zeros((window_length/2, nframe))

    # 循环计算每一个窗内的fft值
    for i in xrange(nframe):
        start = i * window_shift
        end = start + window_length

        # [:window_length/2]是指只留下前一半的fft分量
        spec[:, i] = numpy.log(numpy.abs(numpy.fft.fft(wave_data[start:end])))[:window_length/2]
    return spec

`

测试代码:

from getSpectrum import getSpectrum  
import numpy, matplotlib.pyplot as plt

# 窗长20ms, 窗移时窗长的0.5倍
speech_spectrum = getSpectrum('speech.wav', 20, 0.5)  
noise_spectrum = getSpectrum('noise.wav', 20, 0.5)  
noised_speech_spectrum = speech_spectrum[:,:300] + noise_spectrum[:, :300]

plt.subplot(311)  
plt.imshow(speech_spectrum)

plt.subplot(312)  
plt.imshow(noise_spectrum)

plt.subplot(313)  
plt.imshow(noised_speech_spectrum)

Friskit

继续阅读此作者的更多文章