R/seq2seq.R
attention_luong_monotonic.Rd
Monotonic attention mechanism with Luong-style energy function.
attention_luong_monotonic( object, units, memory = NULL, memory_sequence_length = NULL, scale = FALSE, sigmoid_noise = 0, sigmoid_noise_seed = NULL, score_bias_init = 0, mode = "parallel", dtype = NULL, name = "LuongMonotonicAttention", ... )
object | Model or layer object |
---|---|
units | The depth of the query mechanism. |
memory | The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...]. |
memory_sequence_length | (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths. |
scale | boolean. Whether to scale the energy term. |
sigmoid_noise | Standard deviation of pre-sigmoid noise. See the docstring for `_monotonic_probability_fn` for more information. |
sigmoid_noise_seed | (optional) Random seed for pre-sigmoid noise. |
score_bias_init | Initial value for score bias scalar. It's recommended to initialize this to a negative value when the length of the memory is large. |
mode | How to compute the attention distribution. Must be one of 'recursive', 'parallel', or 'hard'. See the docstring for tfa.seq2seq.monotonic_attention for more information. |
dtype | The data type for the query and memory layers of the attention mechanism. |
name | Name to use when creating ops. |
... | A list that contains other common arguments for layer creation. |
None
This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can't attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Otherwise, it is equivalent to LuongAttention. This approach is proposed in [Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, "Online and Linear-Time Attention by Enforcing Monotonic Alignments." ICML 2017.](https://arxiv.org/abs/1704.00784)