Audio
normalize_loudness(wav)
Normalize the loudness of an audio waveform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wav |
Tensor
|
The input waveform. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: The normalized waveform. |
Examples:
>>> wav = np.array([1.0, 2.0, 3.0])
>>> normalize_loudness(wav)
tensor([0.33333333, 0.66666667, 1. ])
Source code in training/preprocess/audio.py
preprocess_audio(audio, sr_actual, sr)
Preprocesses audio by converting stereo to mono, resampling if necessary, and returning the audio tensor and sample rate.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio |
Tensor
|
The audio tensor to preprocess. |
required |
sr_actual |
int
|
The actual sample rate of the audio. |
required |
sr |
Union[int, None]
|
The target sample rate to resample the audio to, if necessary. |
required |
Returns:
Type | Description |
---|---|
Tuple[Tensor, int]
|
Tuple[torch.Tensor, int]: The preprocessed audio tensor and sample rate. |
Source code in training/preprocess/audio.py
resample(wav, orig_sr, target_sr)
Resamples an audio waveform from the original sampling rate to the target sampling rate.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wav |
ndarray
|
The audio waveform to be resampled. |
required |
orig_sr |
int
|
The original sampling rate of the audio waveform. |
required |
target_sr |
int
|
The target sampling rate to resample the audio waveform to. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The resampled audio waveform. |
Source code in training/preprocess/audio.py
safe_load(path, sr)
Load an audio file from disk and return its content as a numpy array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
The path to the audio file. |
required |
sr |
int or None
|
The target sampling rate. If None, the original sampling rate is used. |
required |
Returns:
Type | Description |
---|---|
Tuple[ndarray, int]
|
Tuple[np.ndarray, int]: A tuple containing the audio content as a numpy array and the actual sampling rate. |
Source code in training/preprocess/audio.py
stereo_to_mono(audio)
Converts a stereo audio tensor to mono by taking the mean across channels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio |
Tensor
|
Input audio tensor of shape (channels, samples). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: Mono audio tensor of shape (1, samples). |