I have the output of numpy.fft.fft, calculated with 15000 samples at a rate of 500Hz - giving me bins of size $\frac{1}{30}$. Rather than having 15000 samples, I'd rather have say, 50 or 100. I've thought of a couple methods to merge data into new bins (below) to both sum or average the values (shown below) but I'm not sure what kind of difference these methods would have?
Alternatively, I guess I could use the inverse fft and regenerate it with a smaller window?
Here's a code snippet to generate similar data:
n_samples = 15000
sampling_freq = 500
sampling_rate = 1 / sampling_freq
dummy_data = np.arange(n_samples)
nyquist = 0.5 * (1 / sampling_rate)
fft_freqs = np.fft.fftfreq(n=n_samples, d=sampling_rate)
fft = np.fft.fft(dummy_data)
And here's an example of how I'm generating new bins, for summing and averaging.
# Number of bins, for pos / neg
# Total bins is 2*num_bins+1, to ensure a bin at 0Hz
num_bins = 10
# Use indexing to generate sorted arrays
pos_mask = np.where(np.logical_and(fft_freqs >= min_freq, fft_freqs <= 99999999999))
neg_mask = np.where(np.logical_and(fft_freqs >= -99999999999, fft_freqs < 0))
neg_freqs = fft_freqs[neg_mask]
pos_freqs = fft_freqs[pos_mask]
neg_ffts = fft[neg_mask]
pos_ffts = fft[pos_mask]
sorted_ffts = np.concatenate((neg_ffts, pos_ffts))
sorted_freqs = np.concatenate((neg_freqs, pos_freqs))
min_freq = -1 * nyquist
max_freq = nyquist
bin_size = max_freq / num_bins
half_bin_size = bin_size / 2
bin_border = np.concatenate([
np.linspace(min_freq, -1 * half_bin_size, num_bins + 1, endpoint=True),
np.linspace(half_bin_size, max_freq, num_bins + 1, endpoint=True)
])
new_bin_centers = (bin_border[1:] + bin_border[:-1]) / 2
print(len(new_bin_centers))
# Finds where sorted freqs change from one bin to the next
bin_border_idxs = np.searchsorted(sorted_freqs, bin_border)
# Number of elements in each new bin
bin_lens = np.diff(bin_border_idxs)
# Generate new bins
# Sum
new_bin_vals_sum = np.where(bin_lens == 0, np.nan, np.add.reduceat(sorted_ffts, bin_border_idxs[:-1]))
# Average
new_bin_vals_avg = np.where(bin_lens == 0, np.nan, np.add.reduceat(sorted_ffts, bin_border_idxs[:-1]) / bin_lens)
To clarify - I'm not trying to expand the frequency range by changing the bin size, or anything like that. I also just have the output, not the raw data. I want to get a better understanding of how to best manipulate the data, what impacts that may have, and if there's any other considerations I may have missed.