Skip to content

Layers

ConvNorm

Bases: Module

1D Convolution with optional batch normalization.

Parameters:

Name Type Description Default
in_channels int

Number of input channels.

required
out_channels int

Number of output channels.

required
kernel_size int

Size of the convolving kernel. Defaults to 1.

1
stride int

Stride of the convolution. Defaults to 1.

1
padding int

Zero-padding added to both sides of the input. Defaults to None.

None
dilation int

Spacing between kernel elements. Defaults to 1.

1
bias bool

If True, adds a learnable bias to the output. Defaults to True.

True
Source code in models/enhancer/gaussian_diffusion/layers.py
class ConvNorm(Module):
    r"""1D Convolution with optional batch normalization.

    Args:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (int, optional): Size of the convolving kernel. Defaults to 1.
        stride (int, optional): Stride of the convolution. Defaults to 1.
        padding (int, optional): Zero-padding added to both sides of the input. Defaults to None.
        dilation (int, optional): Spacing between kernel elements. Defaults to 1.
        bias (bool, optional): If True, adds a learnable bias to the output. Defaults to True.
    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int = 1,
        stride: int = 1,
        padding: Optional[int] = None,
        dilation: int = 1,
        bias: bool = True,
    ):
        super().__init__()

        if padding is None:
            assert kernel_size % 2 == 1
            padding = int(dilation * (kernel_size - 1) / 2)

        self.conv = nn.Conv1d(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            dilation=dilation,
            bias=bias,
        )
        nn.init.kaiming_normal_(self.conv.weight)

    def forward(self, signal: Tensor) -> Tensor:
        r"""Forward pass through the convolutional layer.

        Args:
            signal (torch.Tensor): Input signal tensor.

        Returns:
            torch.Tensor: Output tensor after convolution.
        """
        conv_signal = self.conv(signal)

        return conv_signal

forward(signal)

Forward pass through the convolutional layer.

Parameters:

Name Type Description Default
signal Tensor

Input signal tensor.

required

Returns:

Type Description
Tensor

torch.Tensor: Output tensor after convolution.

Source code in models/enhancer/gaussian_diffusion/layers.py
def forward(self, signal: Tensor) -> Tensor:
    r"""Forward pass through the convolutional layer.

    Args:
        signal (torch.Tensor): Input signal tensor.

    Returns:
        torch.Tensor: Output tensor after convolution.
    """
    conv_signal = self.conv(signal)

    return conv_signal

DiffusionEmbedding

Bases: Module

Diffusion Step Embedding.

This module generates diffusion step embeddings for the given input.

Parameters:

Name Type Description Default
d_denoiser int

Dimension of the denoiser.

required

Attributes:

Name Type Description
dim int

Dimension of the diffusion step embedding.

Source code in models/enhancer/gaussian_diffusion/layers.py
class DiffusionEmbedding(Module):
    r"""Diffusion Step Embedding.

    This module generates diffusion step embeddings for the given input.

    Args:
        d_denoiser (int): Dimension of the denoiser.

    Attributes:
        dim (int): Dimension of the diffusion step embedding.
    """

    def __init__(self, d_denoiser: int):
        super().__init__()
        self.dim = d_denoiser

    def forward(self, x: Tensor) -> Tensor:
        r"""Forward pass through the DiffusionEmbedding module.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Diffusion step embeddings.
        """
        device = x.device
        half_dim = self.dim // 2

        emb = math.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)

        emb = x[:, None] * emb[None, :]
        emb = torch.cat((emb.sin(), emb.cos()), dim=-1)

        return emb

forward(x)

Forward pass through the DiffusionEmbedding module.

Parameters:

Name Type Description Default
x Tensor

Input tensor.

required

Returns:

Type Description
Tensor

torch.Tensor: Diffusion step embeddings.

Source code in models/enhancer/gaussian_diffusion/layers.py
def forward(self, x: Tensor) -> Tensor:
    r"""Forward pass through the DiffusionEmbedding module.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        torch.Tensor: Diffusion step embeddings.
    """
    device = x.device
    half_dim = self.dim // 2

    emb = math.log(10000) / (half_dim - 1)
    emb = torch.exp(torch.arange(half_dim, device=device) * -emb)

    emb = x[:, None] * emb[None, :]
    emb = torch.cat((emb.sin(), emb.cos()), dim=-1)

    return emb

LinearNorm

Bases: Module

LinearNorm Projection.

This module performs a linear projection with optional bias.

Parameters:

Name Type Description Default
in_features int

Number of input features.

required
out_features int

Number of output features.

required
bias bool

If True, adds a learnable bias to the output. Default is False.

False

Attributes:

Name Type Description
linear Linear

Linear transformation module.

Source code in models/enhancer/gaussian_diffusion/layers.py
class LinearNorm(Module):
    r"""LinearNorm Projection.

    This module performs a linear projection with optional bias.

    Args:
        in_features (int): Number of input features.
        out_features (int): Number of output features.
        bias (bool, optional): If True, adds a learnable bias to the output. Default is False.

    Attributes:
        linear (torch.nn.Linear): Linear transformation module.

    """

    def __init__(
        self,
        in_features: int,
        out_features: int,
        bias: bool = False,
    ):
        super().__init__()
        self.linear = nn.Linear(in_features, out_features, bias)

        nn.init.xavier_uniform_(self.linear.weight)
        if bias:
            nn.init.constant_(self.linear.bias, 0.0)

    def forward(self, x: Tensor) -> Tensor:
        r"""Forward pass through the LinearNorm module.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Output tensor after linear projection.
        """
        x = self.linear(x)
        return x

forward(x)

Forward pass through the LinearNorm module.

Parameters:

Name Type Description Default
x Tensor

Input tensor.

required

Returns:

Type Description
Tensor

torch.Tensor: Output tensor after linear projection.

Source code in models/enhancer/gaussian_diffusion/layers.py
def forward(self, x: Tensor) -> Tensor:
    r"""Forward pass through the LinearNorm module.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        torch.Tensor: Output tensor after linear projection.
    """
    x = self.linear(x)
    return x

Mish

Bases: Module

Applies the Mish activation function.

Mish is a smooth, non-monotonic function that attempts to mitigate the problems of dying ReLU units in deep neural networks.

Source code in models/enhancer/gaussian_diffusion/layers.py
class Mish(Module):
    r"""Applies the Mish activation function.

    Mish is a smooth, non-monotonic function that attempts to mitigate the
    problems of dying ReLU units in deep neural networks.
    """

    def forward(self, x: Tensor) -> Tensor:
        r"""Forward pass of the Mish activation function.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Output tensor after applying Mish activation.
        """
        return x * torch.tanh(F.softplus(x))

forward(x)

Forward pass of the Mish activation function.

Parameters:

Name Type Description Default
x Tensor

Input tensor.

required

Returns:

Type Description
Tensor

torch.Tensor: Output tensor after applying Mish activation.

Source code in models/enhancer/gaussian_diffusion/layers.py
def forward(self, x: Tensor) -> Tensor:
    r"""Forward pass of the Mish activation function.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        torch.Tensor: Output tensor after applying Mish activation.
    """
    return x * torch.tanh(F.softplus(x))

ResidualBlock

Bases: Module

Residual Block.

This module defines a residual block used in a neural network architecture. It consists of several convolutional and linear projections followed by nonlinear activations.

Parameters:

Name Type Description Default
d_encoder int

Dimension of the encoder output.

required
residual_channels int

Number of channels in the residual block.

required
dropout float

Dropout probability.

required
d_spk_prj int

Dimension of the speaker projection.

required
multi_speaker bool

Flag indicating if the model is trained with multiple speakers. Defaults to True.

True

Attributes:

Name Type Description
multi_speaker bool

Flag indicating if the model is trained with multiple speakers.

conv_layer ConvNorm

Convolutional layer in the residual block.

diffusion_projection LinearNorm

Linear projection for the diffusion step.

speaker_projection LinearNorm

Linear projection for the speaker embedding.

conditioner_projection ConvNorm

Convolutional projection for the conditioner.

output_projection ConvNorm

Convolutional projection for the output.

Source code in models/enhancer/gaussian_diffusion/layers.py
class ResidualBlock(Module):
    r"""Residual Block.

    This module defines a residual block used in a neural network architecture. It consists of
    several convolutional and linear projections followed by nonlinear activations.

    Args:
        d_encoder (int): Dimension of the encoder output.
        residual_channels (int): Number of channels in the residual block.
        dropout (float): Dropout probability.
        d_spk_prj (int): Dimension of the speaker projection.
        multi_speaker (bool, optional): Flag indicating if the model is trained with multiple speakers. Defaults to True.

    Attributes:
        multi_speaker (bool): Flag indicating if the model is trained with multiple speakers.
        conv_layer (ConvNorm): Convolutional layer in the residual block.
        diffusion_projection (LinearNorm): Linear projection for the diffusion step.
        speaker_projection (LinearNorm): Linear projection for the speaker embedding.
        conditioner_projection (ConvNorm): Convolutional projection for the conditioner.
        output_projection (ConvNorm): Convolutional projection for the output.
    """

    def __init__(
        self,
        d_encoder: int,
        residual_channels: int,
        dropout: float,
        d_spk_prj: int,
        multi_speaker: bool = True,
    ):
        super().__init__()
        self.multi_speaker = multi_speaker
        self.conv_layer = ConvNorm(
            residual_channels,
            2 * residual_channels,
            kernel_size=3,
            stride=1,
            padding=int((3 - 1) / 2),
            dilation=1,
        )
        self.diffusion_projection = LinearNorm(residual_channels, residual_channels)
        if multi_speaker:
            self.speaker_projection = LinearNorm(d_spk_prj, residual_channels)
        self.conditioner_projection = ConvNorm(
            d_encoder, residual_channels, kernel_size=1,
        )
        self.output_projection = ConvNorm(
            residual_channels, 2 * residual_channels, kernel_size=1,
        )

    def forward(
        self,
        x: Tensor,
        conditioner: Tensor,
        diffusion_step: Tensor,
        speaker_emb: Tensor,
        mask: Optional[Tensor] = None,
    ):
        r"""Forward pass through the ResidualBlock module.

        Args:
            x (torch.Tensor): Input tensor.
            conditioner (torch.Tensor): Conditioner tensor.
            diffusion_step (torch.Tensor): Diffusion step tensor.
            speaker_emb (torch.Tensor): Speaker embedding tensor.
            mask (torch.Tensor, optional): Mask tensor. Defaults to None.

        Returns:
            Tuple[torch.Tensor, torch.Tensor]: Tuple containing the output tensor and skip tensor.
        """
        diffusion_step = self.diffusion_projection(diffusion_step).unsqueeze(-1)
        conditioner = self.conditioner_projection(conditioner)
        # conditioner = self.conditioner_projection(conditioner.transpose(1, 2))
        if self.multi_speaker:
            # speaker_emb = self.speaker_projection(speaker_emb).unsqueeze(1).expand(
            #     -1, conditioner.shape[-1], -1,
            # ).transpose(1, 2)
            speaker_emb = self.speaker_projection(speaker_emb).expand(
                -1, conditioner.shape[-1], -1,
            ).transpose(1, 2)

        residual = y = x + diffusion_step
        y = self.conv_layer(
            (y + conditioner + speaker_emb) if self.multi_speaker else (y + conditioner),
        )
        gate, filter = torch.chunk(y, 2, dim=1)
        y = torch.sigmoid(gate) * torch.tanh(filter)

        y = self.output_projection(y)
        x, skip = torch.chunk(y, 2, dim=1)

        return (x + residual) / math.sqrt(2.0), skip

forward(x, conditioner, diffusion_step, speaker_emb, mask=None)

Forward pass through the ResidualBlock module.

Parameters:

Name Type Description Default
x Tensor

Input tensor.

required
conditioner Tensor

Conditioner tensor.

required
diffusion_step Tensor

Diffusion step tensor.

required
speaker_emb Tensor

Speaker embedding tensor.

required
mask Tensor

Mask tensor. Defaults to None.

None

Returns:

Type Description

Tuple[torch.Tensor, torch.Tensor]: Tuple containing the output tensor and skip tensor.

Source code in models/enhancer/gaussian_diffusion/layers.py
def forward(
    self,
    x: Tensor,
    conditioner: Tensor,
    diffusion_step: Tensor,
    speaker_emb: Tensor,
    mask: Optional[Tensor] = None,
):
    r"""Forward pass through the ResidualBlock module.

    Args:
        x (torch.Tensor): Input tensor.
        conditioner (torch.Tensor): Conditioner tensor.
        diffusion_step (torch.Tensor): Diffusion step tensor.
        speaker_emb (torch.Tensor): Speaker embedding tensor.
        mask (torch.Tensor, optional): Mask tensor. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: Tuple containing the output tensor and skip tensor.
    """
    diffusion_step = self.diffusion_projection(diffusion_step).unsqueeze(-1)
    conditioner = self.conditioner_projection(conditioner)
    # conditioner = self.conditioner_projection(conditioner.transpose(1, 2))
    if self.multi_speaker:
        # speaker_emb = self.speaker_projection(speaker_emb).unsqueeze(1).expand(
        #     -1, conditioner.shape[-1], -1,
        # ).transpose(1, 2)
        speaker_emb = self.speaker_projection(speaker_emb).expand(
            -1, conditioner.shape[-1], -1,
        ).transpose(1, 2)

    residual = y = x + diffusion_step
    y = self.conv_layer(
        (y + conditioner + speaker_emb) if self.multi_speaker else (y + conditioner),
    )
    gate, filter = torch.chunk(y, 2, dim=1)
    y = torch.sigmoid(gate) * torch.tanh(filter)

    y = self.output_projection(y)
    x, skip = torch.chunk(y, 2, dim=1)

    return (x + residual) / math.sqrt(2.0), skip