I don’t frequently use FFT, so it still confounds every time I try to interpret the results from FFT, I’ll try to organize my understanding here…
INFORMATION ABOUT THE DATA FILE TO PROCESS
data = array of samples data_pts = total samples in data file (number of samples/lines in data file) data_srate = sampling rate (points per second) data_stime = total time covered in seconds (data_pts / data_srate)
THE numpy.fft() FUNCTION AND THE RESULTS
fft_result = fft(data) fft_freqs = fftfreq( len(fft_result) )
(fft_result has a real and imag part, but I’m focusing on the real part here)
len(fft_result) == len(data)
but this is mostly meaningless
fft_result[n] corresponds to fft_freqs[n]
PRECISION
The interval difference if fft_freqs equals the inverse of data_stime.
This interval has nothing to do with the number of samples which is what confused me most. I had expected that having 1024 samples per second would return a more precise frequency plotting versus 2 samples per second.
In other words, if the sampled time period is 10 seconds long. No matter if you have 2 or 100 samples per second, the intervals of the frequency ‘bins’ will be 0.1. The number of 0.1-bins depends on the number of samples. In this case, 2 samples per second will produce 20 bins ranging from -1.0 to 0.9; 100 samples will give 1000 bins, from -50.0 to 49.9.
dfreq = 1/data_stime
range = (-dfreq*data_stime*data_srate/2 , dfreq*data_stime*data_srate/2 - dfreq)
= (-data_srate/2 , data_srate/2 - dfreq )
IN SHORT
That seems more complicated than it needs to be, but it was the thought process that helped me arrive at a better solution. After doing the math and comparisons I find that there is no need to do all these calculations. The sampling rate only needs to be applied to the freq bins, similar to dividing the x-axis array by the rate when plotting the data. Squaring the FFT values helps accentuate the peaks.
data = [] #array to hold data read from file
srate=2. #sample rate
samples = 0 #counter for the number of lines in file (# of data samples)
with open(filename, 'r') as rawfile:
for i, line in enumerate(rawfile):
data.append(float(line))
samples = i + 1
x = arange(samples)/srate
# PLOT THE ORIGINAL DATA READ FROM FILE
plt.plot(x, data)
plt.show()
# APPLY FFT
FFT = fft(data)
freqs = fftfreq(samples) * srate
print 'bin range=', [freqs.min(), freqs.max()]
print 'bin interval=', srate/samples
# PLOT THE POSITIVE HALF OF FREQ VALUES
plt.plot(freqs[:samples/2], (abs(FFT.real)**2)[:samples/2], 'g*')
plt.show()