Layer-wise Adaptive Moments

optimizer_lamb(
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-06,
  weight_decay_rate = 0,
  exclude_from_weight_decay = NULL,
  exclude_from_layer_adaptation = NULL,
  name = "LAMB",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate	A `Tensor` or a floating point value. or a schedule that is a `tf$keras$optimizers$schedules$LearningRateSchedule` The learning rate.
beta_1	A `float` value or a constant `float` tensor. The exponential decay rate for the 1st moment estimates.
beta_2	A `float` value or a constant `float` tensor. The exponential decay rate for the 2nd moment estimates.
epsilon	A small constant for numerical stability.
weight_decay_rate	weight decay rate.
exclude_from_weight_decay	List of regex patterns of variables excluded from weight decay. Variables whose name contain a substring matching the pattern will be excluded.
exclude_from_layer_adaptation	List of regex patterns of variables excluded from layer adaptation. Variables whose name contain a substring matching the pattern will be excluded.
name	Optional name for the operations created when applying gradients. Defaults to "LAMB".
clipnorm	is clip gradients by norm.
clipvalue	is clip gradients by value.
decay	is included for backward compatibility to allow time inverse decay of learning rate.
lr	is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with `keras::compile()`

Examples


if (FALSE) {
keras_model_sequential() %>%
  layer_dense(32, input_shape = c(784)) %>%
  compile(
    optimizer = optimizer_lamb(),
    loss='binary_crossentropy',
    metrics='accuracy'
  )
}