NovoGrad

optimizer_novograd(
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-07,
  weight_decay = 0,
  grad_averaging = FALSE,
  amsgrad = FALSE,
  name = "NovoGrad",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate

A `Tensor` or a floating point value. or a schedule that is a `tf$keras$optimizers$schedules$LearningRateSchedule` The learning rate.

beta_1

A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.

beta_2

A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.

epsilon

A small constant for numerical stability.

weight_decay

A floating point value. Weight decay for each param.

grad_averaging

determines whether to use Adam style exponential moving averaging for the first order moments.

amsgrad

boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond"

name

Optional name for the operations created when applying gradients. Defaults to "NovoGrad".

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with `keras::compile()`

Examples

if (FALSE) { keras_model_sequential() %>% layer_dense(32, input_shape = c(784)) %>% compile( optimizer = optimizer_novograd(), loss='binary_crossentropy', metrics='accuracy' ) }