Optimizer that implements the Momentum algorithm with weight_decay

This is an implementation of the SGDW optimizer described in "Decoupled Weight Decay Regularization" by Loshchilov & Hutter (https://arxiv.org/abs/1711.05101) ([pdf])(https://arxiv.org/pdf/1711.05101.pdf). It computes the update step of tf.keras.optimizers.SGD and additionally decays the variable. Note that this is different from adding L2 regularization on the variables to the loss. Decoupling the weight decay from other hyperparameters (in particular the learning rate) simplifies hyperparameter search. For further information see the documentation of the SGD Optimizer.

optimizer_decay_sgdw(
  weight_decay,
  learning_rate = 0.001,
  momentum = 0,
  nesterov = FALSE,
  name = "SGDW",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

weight_decay	weight decay rate.
learning_rate	float hyperparameter >= 0. Learning rate.
momentum	float hyperparameter >= 0 that accelerates SGD in the relevant direction and dampens oscillations.
nesterov	boolean. Whether to apply Nesterov momentum.
name	Optional name prefix for the operations created when applying gradients. Defaults to 'SGD'.
clipnorm	is clip gradients by norm.
clipvalue	is clip gradients by value.
decay	is included for backward compatibility to allow time inverse decay of learning rate.
lr	is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with `keras::compile()`

Examples


if (FALSE) {

step = tf$Variable(0L, trainable = FALSE)
schedule = tf$optimizers$schedules$PiecewiseConstantDecay(list(c(10000, 15000)),
list(c(1e-0, 1e-1, 1e-2)))
lr = 1e-1 * schedule(step)
wd = lambda: 1e-4 * schedule(step)

}