Core
- rythm_forge.core.amplitude_to_dB(A, ref=1.0, amin=1e-10, top_db=80.0)[source]
Convert an amplitude spectrogram to decibel (dB) units.
- Parameters:
A – np.ndarray Input amplitude spectrogram.
ref – float or callable Reference value. If scalar, amplitude is scaled relative to ref. If callable, the reference value is computed as ref(A).
amin – float Minimum threshold for A and ref.
top_db – float Threshold the output at top_db below the peak.
- Returns:
np.ndarray The dB-scaled spectrogram.
- rythm_forge.core.fft(samples: ndarray) ndarray[source]
Compute the Fast Fourier Transform (FFT) of the input samples.
- Parameters:
samples – A 1D numpy array of input audio samples.
- Returns:
A 1D numpy array containing the frequency bins of the FFT.
- Raises:
RythmForgeValueError – If the input samples array is not 1D.
- rythm_forge.core.hz_to_mel(array: ndarray)[source]
Converts Hz to Mels. :param array: np.ndarray of values in Hz to be converted to Mels
- rythm_forge.core.ifft(frequency_bins: ndarray) ndarray[source]
Compute the Inverse Fast Fourier Transform (IFFT) of the input frequency bins.
- Parameters:
frequency_bins – A 1D numpy array of frequency bins.
- Returns:
A 1D numpy array containing the reconstructed audio samples.
- Raises:
RythmForgeValueError – If the input frequency bins array is not 1D.
- rythm_forge.core.istft(stft_matrix: ndarray, n_fft=2048, hop_size=512, window_length=None, center=True)[source]
Compute the Inverse Short-Time Fourier Transform (ISTFT) of the input STFT matrix.
- Parameters:
stft_matrix – np.ndarray A 2D or 3D numpy array containing the STFT of an audio signal. The shape should be (channels, frequency_bins, frames) if 3D, otherwise (frequency_bins, frames).
n_fft – int, optional The number of frequency bins. Default is 2048. Must be a power of 2.
hop_size – int, optional The hop size (stride) between successive frames. Default is 512.
window_length – int, optional The length of the window function applied to each frame. If None, it defaults to n_fft. Must be less than or equal to n_fft.
center – bool, optional If True, the signal is padded such that the t-th frame is centered at time t*hop_size. Default is True.
- Returns:
np.ndarray A 1D or 2D numpy array containing the reconstructed audio signal. The shape is (samples, channels) if the input was 3D, otherwise (samples,).
- Raises:
RythmForgeValueError – If window_length is greater than n_fft.
- rythm_forge.core.magnitude(complex_matrix: ndarray)[source]
Converts matrix filled with complex values to matrix of magnitudes of elements, similar to np.abs(array) :param complex_matrix: np.ndarray
Array with complex values, most often from stft
- Returns:
np.ndarray
- rythm_forge.core.mel_filter_bank(sr: int, n_fft: int, n_mel: int) ndarray[source]
Create a Mel filter-bank. This produces a linear transformation matrix to project FFT bins onto Mel-frequency bins. :param sr : int > 0 [scalar]
Sampling rate of the incoming signal.
- Parameters:
n_fft – int > 0 [scalar] Number of FFT components
n_mel – int > 0 [scalar] number of Mel bands to generate
- Returns:
M np.ndarray [shape=(n_mels, 1 + n_fft/2)] Mel transform matrix
- rythm_forge.core.mel_to_hz(array: ndarray)[source]
Converts Mels to Hz :param array: np.ndarray of values in Mels to be converted to Hz
- rythm_forge.core.melspectrogram(stft_matrix: ndarray, n_fft=2048, sr=44100, n_mels=128) ndarray[source]
Convert an STFT matrix to a mel spectrogram.
This function transforms a Short-Time Fourier Transform (STFT) matrix into a mel spectrogram, where the frequency axis is mapped to the mel scale, which is a perceptually motivated scale of pitches.
- Parameters:
stft_matrix – np.ndarray The input STFT matrix of shape (…, n_freqs, n_times), representing the magnitude of the STFT of the audio signal.
n_fft – int, optional, default=2048 The number of FFT components, corresponding to the number of frequency bins in the STFT. This value determines the resolution of the frequency axis.
sr – int, optional, default=44100 The sample rate of the audio signal. This is used to compute the mel filter bank.
n_mels – int, optional, default=128 The number of mel bands to generate. This determines the resolution of the mel scale.
- Returns:
np.ndarray The mel spectrogram of shape (…, n_mels, n_times), where the frequency bins are replaced by mel bands.
- rythm_forge.core.power_to_dB(S, ref=1.0, amin=1e-10, top_db=80.0)[source]
Convert a power spectrogram to decibel (dB) units.
- Parameters:
S – np.ndarray Input power spectrogram.
ref – float or callable Reference value. If scalar, amplitude is scaled relative to ref. If callable, the reference value is computed as ref(S).
amin – float Minimum threshold for S and ref.
top_db – float Threshold the output at top_db below the peak.
- Returns:
np.ndarray The dB-scaled spectrogram.
- rythm_forge.core.resample(y: ndarray, sr: int, new_sr=8000) tuple[ndarray, int][source]
Resample a time series from sr to new_sr
- Parameters:
y – np.ndarray A 1D or 2D numpy array of input audio samples, with each row being different channel
sr – int Original sampling rate at which y has been acquired.
new_sr – int Target sampling rate
- Returns:
,int y_hat: mp.ndarray, y resampled from sr to new_sr new_sr:int sampling rate used in resampling
- rythm_forge.core.stft(samples: ndarray, n_fft=2048, hop_size=512, window_length=None, center=True)[source]
Compute the Short-Time Fourier Transform (STFT) of the input samples.
- Parameters:
samples – np.ndarray A 1D or 2D numpy array of input audio samples. If 1D, it is assumed to be a single channel.
n_fft – int, optional The number of frequency bins. Default is 2048. Must be a power of 2.
hop_size – int, optional The hop size (stride) between successive frames. Default is 512.
window_length – int, optional The length of the window function applied to each frame. If None, it defaults to n_fft. Must be less than or equal to n_fft.
center – bool, optional If True, the signal is padded such that the t-th frame is centered at time t*hop_size. Default is True.
- Returns:
np.ndarray A 2D or 3D numpy array containing the STFT of the input samples. The shape is (channels, frequency_bins, frames) if the input is 2D, otherwise (frequency_bins, frames).
- Raises:
RythmForgeValueError – If window_length is greater than n_fft.