Core

rythm_forge.core.amplitude_to_dB(A, ref=1.0, amin=1e-10, top_db=80.0)[source]

Convert an amplitude spectrogram to decibel (dB) units.

Parameters:

A – np.ndarray Input amplitude spectrogram.
ref – float or callable Reference value. If scalar, amplitude is scaled relative to ref. If callable, the reference value is computed as ref(A).
amin – float Minimum threshold for A and ref.
top_db – float Threshold the output at top_db below the peak.

Returns:

np.ndarray The dB-scaled spectrogram.

rythm_forge.core.fft(samples: ndarray) → ndarray[source]

Compute the Fast Fourier Transform (FFT) of the input samples.

Parameters:: samples – A 1D numpy array of input audio samples.
Returns:: A 1D numpy array containing the frequency bins of the FFT.
Raises:: RythmForgeValueError – If the input samples array is not 1D.

rythm_forge.core.hz_to_mel(array: ndarray)[source]: Converts Hz to Mels. :param array: np.ndarray of values in Hz to be converted to Mels

rythm_forge.core.ifft(frequency_bins: ndarray) → ndarray[source]

Compute the Inverse Fast Fourier Transform (IFFT) of the input frequency bins.

Parameters:: frequency_bins – A 1D numpy array of frequency bins.
Returns:: A 1D numpy array containing the reconstructed audio samples.
Raises:: RythmForgeValueError – If the input frequency bins array is not 1D.

rythm_forge.core.istft(stft_matrix: ndarray, n_fft=2048, hop_size=512, window_length=None, center=True)[source]

Compute the Inverse Short-Time Fourier Transform (ISTFT) of the input STFT matrix.

Parameters:

stft_matrix – np.ndarray A 2D or 3D numpy array containing the STFT of an audio signal. The shape should be (channels, frequency_bins, frames) if 3D, otherwise (frequency_bins, frames).
n_fft – int, optional The number of frequency bins. Default is 2048. Must be a power of 2.
hop_size – int, optional The hop size (stride) between successive frames. Default is 512.
window_length – int, optional The length of the window function applied to each frame. If None, it defaults to n_fft. Must be less than or equal to n_fft.
center – bool, optional If True, the signal is padded such that the t-th frame is centered at time t*hop_size. Default is True.

Returns:

np.ndarray A 1D or 2D numpy array containing the reconstructed audio signal. The shape is (samples, channels) if the input was 3D, otherwise (samples,).

Raises:

RythmForgeValueError – If window_length is greater than n_fft.

rythm_forge.core.magnitude(complex_matrix: ndarray)[source]

Converts matrix filled with complex values to matrix of magnitudes of elements, similar to np.abs(array) :param complex_matrix: np.ndarray

Array with complex values, most often from stft

Returns:: np.ndarray

rythm_forge.core.mel_filter_bank(sr: int, n_fft: int, n_mel: int) → ndarray[source]

Create a Mel filter-bank. This produces a linear transformation matrix to project FFT bins onto Mel-frequency bins. :param sr : int > 0 [scalar]

Sampling rate of the incoming signal.

Parameters:

n_fft – int > 0 [scalar] Number of FFT components
n_mel – int > 0 [scalar] number of Mel bands to generate

Returns:

M np.ndarray [shape=(n_mels, 1 + n_fft/2)] Mel transform matrix

rythm_forge.core.mel_to_hz(array: ndarray)[source]: Converts Mels to Hz :param array: np.ndarray of values in Mels to be converted to Hz

rythm_forge.core.melspectrogram(stft_matrix: ndarray, n_fft=2048, sr=44100, n_mels=128) → ndarray[source]

Convert an STFT matrix to a mel spectrogram.

This function transforms a Short-Time Fourier Transform (STFT) matrix into a mel spectrogram, where the frequency axis is mapped to the mel scale, which is a perceptually motivated scale of pitches.

Parameters:

stft_matrix – np.ndarray The input STFT matrix of shape (…, n_freqs, n_times), representing the magnitude of the STFT of the audio signal.
n_fft – int, optional, default=2048 The number of FFT components, corresponding to the number of frequency bins in the STFT. This value determines the resolution of the frequency axis.
sr – int, optional, default=44100 The sample rate of the audio signal. This is used to compute the mel filter bank.
n_mels – int, optional, default=128 The number of mel bands to generate. This determines the resolution of the mel scale.

Returns:

np.ndarray The mel spectrogram of shape (…, n_mels, n_times), where the frequency bins are replaced by mel bands.

rythm_forge.core.power_to_dB(S, ref=1.0, amin=1e-10, top_db=80.0)[source]

Convert a power spectrogram to decibel (dB) units.

Parameters:

S – np.ndarray Input power spectrogram.
ref – float or callable Reference value. If scalar, amplitude is scaled relative to ref. If callable, the reference value is computed as ref(S).
amin – float Minimum threshold for S and ref.
top_db – float Threshold the output at top_db below the peak.

Returns:

np.ndarray The dB-scaled spectrogram.

rythm_forge.core.resample(y: ndarray, sr: int, new_sr=8000) → tuple[ndarray, int][source]

Resample a time series from sr to new_sr

Parameters:

y – np.ndarray A 1D or 2D numpy array of input audio samples, with each row being different channel
sr – int Original sampling rate at which y has been acquired.
new_sr – int Target sampling rate

Returns:

,int y_hat: mp.ndarray, y resampled from sr to new_sr new_sr:int sampling rate used in resampling

rythm_forge.core.stft(samples: ndarray, n_fft=2048, hop_size=512, window_length=None, center=True)[source]

Compute the Short-Time Fourier Transform (STFT) of the input samples.

Parameters:

samples – np.ndarray A 1D or 2D numpy array of input audio samples. If 1D, it is assumed to be a single channel.
n_fft – int, optional The number of frequency bins. Default is 2048. Must be a power of 2.
hop_size – int, optional The hop size (stride) between successive frames. Default is 512.
window_length – int, optional The length of the window function applied to each frame. If None, it defaults to n_fft. Must be less than or equal to n_fft.
center – bool, optional If True, the signal is padded such that the t-th frame is centered at time t*hop_size. Default is True.

Returns:

np.ndarray A 2D or 3D numpy array containing the STFT of the input samples. The shape is (channels, frequency_bins, frames) if the input is 2D, otherwise (frequency_bins, frames).

Raises:

RythmForgeValueError – If window_length is greater than n_fft.