Monotonic Alignments Shrink
b_mas(b_attn_map, in_lens, out_lens, width=1)
Applies Monotonic Alignments Shrink (MAS) operation in parallel to the batches of an attention map.
It uses the mas_width1
function internally to perform MAS operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b_attn_map |
ndarray
|
The batched attention map; a 3D array where the first dimension is the batch size, second dimension corresponds to source length, and third dimension corresponds to target length. |
required |
in_lens |
ndarray
|
Lengths of sequences in the input batch. |
required |
out_lens |
ndarray
|
Lengths of sequences in the output batch. |
required |
width |
int
|
The width for the MAS operation. Defaults to 1. |
1
|
Raises:
Type | Description |
---|---|
AssertionError
|
If width is not equal to 1. This function currently supports only width of 1. |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The batched attention map after applying the MAS operation. It has the same dimensions as |
Source code in models/tts/delightful_tts/acoustic_model/mas.py
mas_width1(attn_map)
Applies a Monotonic Alignments Shrink (MAS) operation with a hard-coded width of 1 to an attention map. Mas with hardcoded width=1 Essentially, it produces optimal alignments based on previous attention distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attn_map |
ndarray
|
The original attention map, a 2D numpy array where rows correspond to mel bins and columns to text bins. |
required |
Returns:
Name | Type | Description |
---|---|---|
opt |
ndarray
|
Returns the optimal attention map after applying the MAS operation. |