Conformer Multi-Headed Self Attention
ConformerMultiHeadedSelfAttention
Bases: Module
Conformer employ multi-headed self-attention (MHSA) while integrating an important technique from Transformer-XL,
the relative sinusoidal positional encoding scheme. The relative positional encoding allows the self-attention
module to generalize better on different input length and the resulting encoder is more robust to the variance of
the utterance length. Conformer use prenorm
residual units with dropout which helps training
and regularizing deeper models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
d_model |
int
|
The dimension of model |
required |
num_heads |
int
|
The number of attention heads. |
required |
dropout_p |
float
|
probability of dropout |
required |
inputs, mask
- inputs (batch, time, dim): Tensor containing input vector
- mask (batch, 1, time2) or (batch, time1, time2): Tensor containing indices to be masked
Returns:
Type | Description |
---|---|
(batch, time, dim)
|
Tensor produces by relative multi headed self attention module. |