In this work, we investigate the mechanism underlying loss spikes observed during
neural network training. When the training enters a region with a lower-loss-as-sharper
structure, the training becomes unstable, and the loss exponentially increases once the loss
landscape is too sharp, resulting in the rapid ascent of the loss spike. The training stabilizes
when it finds a flat region. From a frequency perspective, we explain the rapid descent in
loss as being primarily influenced by low-frequency components. We observe a deviation
in the first eigendirection, which can be reasonably explained by the frequency principle,
as low-frequency information is captured rapidly, leading to the rapid descent. Inspired
by our analysis of loss spikes, we revisit the link between the maximum eigenvalue of the
loss Hessian ($λ_{{\rm max}}$), flatness and generalization. We suggest that $λ_{{\rm max}}$ is a good measure
of sharpness but not a good measure for generalization. Furthermore, we experimentally
observe that loss spikes can facilitate condensation, causing input weights to evolve towards
the same direction. And our experiments show that there is a correlation (similar trend)
between $λ_{{\rm max}}$ and condensation. This observation may provide valuable insights for further
theoretical research on the relationship between loss spikes, $λ_{{\rm max}}$, and generalization.