Group normalization layer
layer_group_normalization( object, groups = 2, axis = -1, epsilon = 0.001, center = TRUE, scale = TRUE, beta_initializer = "zeros", gamma_initializer = "ones", beta_regularizer = NULL, gamma_regularizer = NULL, beta_constraint = NULL, gamma_constraint = NULL, ... )
| object | Model or layer object |
|---|---|
| groups | Integer, the number of groups for Group Normalization. Can be in the range [1, N] where N is the input dimension. The input dimension must be divisible by the number of groups. |
| axis | Integer, the axis that should be normalized. |
| epsilon | Small float added to variance to avoid dividing by zero. |
| center | If TRUE, add offset of beta to normalized tensor. If False, beta is ignored. |
| scale | If TRUE, multiply by gamma. If False, gamma is not used. |
| beta_initializer | Initializer for the beta weight. |
| gamma_initializer | Initializer for the gamma weight. |
| beta_regularizer | Optional regularizer for the beta weight. |
| gamma_regularizer | Optional regularizer for the gamma weight. |
| beta_constraint | Optional constraint for the beta weight. |
| gamma_constraint | Optional constraint for the gamma weight. |
| ... | additional parameters to pass |
A tensor
Group Normalization divides the channels into groups and computes within each group the mean and variance for normalization. Empirically, its accuracy is more stable than batch norm in a wide range of small batch sizes, if learning rate is adjusted linearly with batch sizes. Relation to Layer Normalization: If the number of groups is set to 1, then this operation becomes identical to Layer Normalization. Relation to Instance Normalization: If the number of groups is set to the input dimension (number of groups is equal to number of channels), then this operation becomes identical to Instance Normalization.