Phoneme Level Prosody Encoder
PhonemeLevelProsodyEncoder
Bases: Module
Phoneme Level Prosody Encoder Module
This Class is used to encode the phoneme level prosody in the speech synthesis pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preprocess_config |
PreprocessingConfig
|
Configuration for preprocessing. |
required |
model_config |
AcousticModelConfigType
|
Acoustic model configuration. |
required |
Returns:
Type | Description |
---|---|
torch.Tensor: The encoded tensor after applying masked fill. |
Source code in models/tts/delightful_tts/reference_encoder/phoneme_level_prosody_encoder.py
forward(x, src_mask, mels, mel_lens, encoding)
The forward pass of the PhonemeLevelProsodyEncoder. Input tensors are passed through the reference encoder, attention mechanism, and a bottleneck.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor
|
Input tensor of shape [N, seq_len, encoder_embedding_dim]. |
required |
src_mask |
Tensor
|
The mask tensor which contains |
required |
mels |
Tensor
|
The mel-spectrogram with shape [N, Ty/r, n_mels*r], where r=1. |
required |
mel_lens |
Tensor
|
The lengths of each sequence in mels. |
required |
encoding |
Tensor
|
The relative positional encoding tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: Output tensor of shape [N, seq_len, bottleneck_size]. |