Phoneme Prosody Predictor
PhonemeProsodyPredictor
Bases: Module
A class to define the Phoneme Prosody Predictor.
In linguistics, prosody (/ˈprɒsədi, ˈprɒzədi/) is the study of elements of speech that are not individual phonetic segments (vowels and consonants) but which are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals.
Wikipedia Prosody (linguistics)
This prosody predictor is non-parallel and is inspired by the work of Du et al., 2021 ?. It consists of multiple convolution transpose, Leaky ReLU activation, LayerNorm, and dropout layers, followed by a linear transformation to generate the final output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_config |
AcousticModelConfigType
|
Configuration object with model parameters. |
required |
phoneme_level |
bool
|
A flag to decide whether to use phoneme level bottleneck size. |
required |
leaky_relu_slope |
float
|
The negative slope of LeakyReLU activation function. |
LEAKY_RELU_SLOPE
|
Source code in models/tts/delightful_tts/acoustic_model/phoneme_prosody_predictor.py
forward(x, mask)
Forward pass of the prosody predictor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor
|
A 3-dimensional tensor |
required |
mask |
Tensor
|
A 2-dimensional tensor |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: A 3-dimensional tensor |