Rectified Adam (a.k.a. RAdam)
optimizer_radam( learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-07, weight_decay = 0, amsgrad = FALSE, sma_threshold = 5, total_steps = 0, warmup_proportion = 0.1, min_lr = 0, name = "RectifiedAdam", clipnorm = NULL, clipvalue = NULL, decay = NULL, lr = NULL )
| learning_rate | A `Tensor` or a floating point value. or a schedule that is a `tf$keras$optimizers$schedules$LearningRateSchedule` The learning rate. |
|---|---|
| beta_1 | A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates. |
| beta_2 | A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates. |
| epsilon | A small constant for numerical stability. |
| weight_decay | A floating point value. Weight decay for each param. |
| amsgrad | boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond". |
| sma_threshold | A float value. The threshold for simple mean average. |
| total_steps | An integer. Total number of training steps. Enable warmup by setting a positive value. |
| warmup_proportion | A floating point value. The proportion of increasing steps. |
| min_lr | A floating point value. Minimum learning rate after warmup. |
| name | Optional name for the operations created when applying gradients. Defaults to "RectifiedAdam". |
| clipnorm | is clip gradients by norm. |
| clipvalue | is clip gradients by value. |
| decay | is included for backward compatibility to allow time inverse decay of learning rate. |
| lr | is included for backward compatibility, recommended to use learning_rate instead. |
Optimizer for use with `keras::compile()`