R/weight_decay_optimizers.R
optimizer_decay_sgdw.Rd
This is an implementation of the SGDW optimizer described in "Decoupled Weight Decay Regularization" by Loshchilov & Hutter (https://arxiv.org/abs/1711.05101) ([pdf])(https://arxiv.org/pdf/1711.05101.pdf). It computes the update step of tf.keras.optimizers.SGD and additionally decays the variable. Note that this is different from adding L2 regularization on the variables to the loss. Decoupling the weight decay from other hyperparameters (in particular the learning rate) simplifies hyperparameter search. For further information see the documentation of the SGD Optimizer.
optimizer_decay_sgdw( weight_decay, learning_rate = 0.001, momentum = 0, nesterov = FALSE, name = "SGDW", clipnorm = NULL, clipvalue = NULL, decay = NULL, lr = NULL )
weight_decay | weight decay rate. |
---|---|
learning_rate | float hyperparameter >= 0. Learning rate. |
momentum | float hyperparameter >= 0 that accelerates SGD in the relevant direction and dampens oscillations. |
nesterov | boolean. Whether to apply Nesterov momentum. |
name | Optional name prefix for the operations created when applying gradients. Defaults to 'SGD'. |
clipnorm | is clip gradients by norm. |
clipvalue | is clip gradients by value. |
decay | is included for backward compatibility to allow time inverse decay of learning rate. |
lr | is included for backward compatibility, recommended to use learning_rate instead. |
Optimizer for use with `keras::compile()`