Therefore, we consider activations of a neural network to be normalized, if both their mean and their variance across samples are within predefined intervals. If mean and variance of x are already within these intervals, then also mean and variance of y remain in these intervals, i.e., the normalization is transitive across layers. Within these intervals, the mean and variance both converge to ...