Layer-wise Adaptive Moments
optimizer_lamb( learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-06, weight_decay_rate = 0, exclude_from_weight_decay = NULL, exclude_from_layer_adaptation = NULL, name = "LAMB", clipnorm = NULL, clipvalue = NULL, decay = NULL, lr = NULL )
learning_rate | A `Tensor` or a floating point value. or a schedule that is a `tf$keras$optimizers$schedules$LearningRateSchedule` The learning rate. |
---|---|
beta_1 | A `float` value or a constant `float` tensor. The exponential decay rate for the 1st moment estimates. |
beta_2 | A `float` value or a constant `float` tensor. The exponential decay rate for the 2nd moment estimates. |
epsilon | A small constant for numerical stability. |
weight_decay_rate | weight decay rate. |
exclude_from_weight_decay | List of regex patterns of variables excluded from weight decay. Variables whose name contain a substring matching the pattern will be excluded. |
exclude_from_layer_adaptation | List of regex patterns of variables excluded from layer adaptation. Variables whose name contain a substring matching the pattern will be excluded. |
name | Optional name for the operations created when applying gradients. Defaults to "LAMB". |
clipnorm | is clip gradients by norm. |
clipvalue | is clip gradients by value. |
decay | is included for backward compatibility to allow time inverse decay of learning rate. |
lr | is included for backward compatibility, recommended to use learning_rate instead. |
Optimizer for use with `keras::compile()`
if (FALSE) { keras_model_sequential() %>% layer_dense(32, input_shape = c(784)) %>% compile( optimizer = optimizer_lamb(), loss='binary_crossentropy', metrics='accuracy' ) }