Skip to content

Conformer

Conformer

Bases: Module

Conformer class represents the Conformer model which is a sequence-to-sequence model used in some modern automated speech recognition systems. It is composed of several ConformerBlocks.

Parameters:

Name Type Description Default
dim int

The number of expected features in the input.

required
n_layers int

The number of ConformerBlocks in the Conformer model.

required
n_heads int

The number of heads in the multiheaded self-attention mechanism in each ConformerBlock.

required
embedding_dim int

The dimension of the embeddings.

required
p_dropout float

The dropout probability to be used in each ConformerBlock.

required
kernel_size_conv_mod int

The size of the convolving kernel in the convolution module of each ConformerBlock.

required
with_ff bool

If True, each ConformerBlock uses FeedForward layer inside it.

required
Source code in models/tts/delightful_tts/attention/conformer.py
class Conformer(Module):
    r"""`Conformer` class represents the `Conformer` model which is a sequence-to-sequence model
    used in some modern automated speech recognition systems. It is composed of several `ConformerBlocks`.

    Args:
        dim (int): The number of expected features in the input.
        n_layers (int): The number of `ConformerBlocks` in the Conformer model.
        n_heads (int): The number of heads in the multiheaded self-attention mechanism in each `ConformerBlock`.
        embedding_dim (int): The dimension of the embeddings.
        p_dropout (float): The dropout probability to be used in each `ConformerBlock`.
        kernel_size_conv_mod (int): The size of the convolving kernel in the convolution module of each `ConformerBlock`.
        with_ff (bool): If True, each `ConformerBlock` uses FeedForward layer inside it.
    """

    def __init__(
        self,
        dim: int,
        n_layers: int,
        n_heads: int,
        embedding_dim: int,
        p_dropout: float,
        kernel_size_conv_mod: int,
        with_ff: bool,
    ):
        super().__init__()
        self.layer_stack = nn.ModuleList(
            [
                ConformerBlock(
                    dim,
                    n_heads,
                    kernel_size_conv_mod=kernel_size_conv_mod,
                    dropout=p_dropout,
                    embedding_dim=embedding_dim,
                    with_ff=with_ff,
                )
                for _ in range(n_layers)
            ],
        )

    def forward(
        self,
        x: torch.Tensor,
        mask: torch.Tensor,
        embeddings: torch.Tensor,
        encoding: torch.Tensor,
    ) -> torch.Tensor:
        r"""Forward Pass of the Conformer block.

        Args:
            x (Tensor): Input tensor of shape (batch_size, seq_len, num_features).
            mask (Tensor): The mask tensor.
            embeddings (Tensor): Embeddings tensor.
            encoding (Tensor): The positional encoding tensor.

        Returns:
            Tensor: The output tensor of shape (batch_size, seq_len, num_features).
        """
        attn_mask = mask.view((mask.shape[0], 1, 1, mask.shape[1]))
        attn_mask.to(x.device)
        for enc_layer in self.layer_stack:
            x = enc_layer(
                x,
                mask=mask,
                slf_attn_mask=attn_mask,
                embeddings=embeddings,
                encoding=encoding,
            )
        return x

forward(x, mask, embeddings, encoding)

Forward Pass of the Conformer block.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (batch_size, seq_len, num_features).

required
mask Tensor

The mask tensor.

required
embeddings Tensor

Embeddings tensor.

required
encoding Tensor

The positional encoding tensor.

required

Returns:

Name Type Description
Tensor Tensor

The output tensor of shape (batch_size, seq_len, num_features).

Source code in models/tts/delightful_tts/attention/conformer.py
def forward(
    self,
    x: torch.Tensor,
    mask: torch.Tensor,
    embeddings: torch.Tensor,
    encoding: torch.Tensor,
) -> torch.Tensor:
    r"""Forward Pass of the Conformer block.

    Args:
        x (Tensor): Input tensor of shape (batch_size, seq_len, num_features).
        mask (Tensor): The mask tensor.
        embeddings (Tensor): Embeddings tensor.
        encoding (Tensor): The positional encoding tensor.

    Returns:
        Tensor: The output tensor of shape (batch_size, seq_len, num_features).
    """
    attn_mask = mask.view((mask.shape[0], 1, 1, mask.shape[1]))
    attn_mask.to(x.device)
    for enc_layer in self.layer_stack:
        x = enc_layer(
            x,
            mask=mask,
            slf_attn_mask=attn_mask,
            embeddings=embeddings,
            encoding=encoding,
        )
    return x