Implements Luong-style (multiplicative) attention scoring.

attention_luong(
  object,
  units,
  memory = NULL,
  memory_sequence_length = NULL,
  scale = FALSE,
  probability_fn = "softmax",
  dtype = NULL,
  name = "LuongAttention",
  ...
)

Arguments

object

Model or layer object

units

The depth of the attention mechanism.

memory

The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].

memory_sequence_length

(optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.

scale

boolean. Whether to scale the energy term.

probability_fn

(optional) string, the name of function to convert the attention score to probabilities. The default is softmax which is tf.nn.softmax. Other options is hardmax, which is hardmax() within this module. Any other value will result intovalidation error. Default to use softmax.

dtype

The data type for the memory layer of the attention mechanism.

name

Name to use when creating ops.

...

A list that contains other common arguments for layer creation.

Value

None

Details

This attention has two forms. The first is standard Luong attention, as described in: Minh-Thang Luong, Hieu Pham, Christopher D. Manning. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015. The second is the scaled form inspired partly by the normalized form of Bahdanau attention. To enable the second form, construct the object with parameter `scale=TRUE`.