Implements Luong-style (multiplicative) attention scoring.
attention_luong( object, units, memory = NULL, memory_sequence_length = NULL, scale = FALSE, probability_fn = "softmax", dtype = NULL, name = "LuongAttention", ... )
object | Model or layer object |
---|---|
units | The depth of the attention mechanism. |
memory | The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...]. |
memory_sequence_length | (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths. |
scale | boolean. Whether to scale the energy term. |
probability_fn | (optional) string, the name of function to convert the attention score to probabilities. The default is softmax which is tf.nn.softmax. Other options is hardmax, which is hardmax() within this module. Any other value will result intovalidation error. Default to use softmax. |
dtype | The data type for the memory layer of the attention mechanism. |
name | Name to use when creating ops. |
... | A list that contains other common arguments for layer creation. |
None
This attention has two forms. The first is standard Luong attention, as described in: Minh-Thang Luong, Hieu Pham, Christopher D. Manning. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015. The second is the scaled form inspired partly by the normalized form of Bahdanau attention. To enable the second form, construct the object with parameter `scale=TRUE`.