Problem:
computing the gradient is expensive in forward mode. Also, it takes a lot of space to save
this vector at many time steps.
The beauty of the variational approach is that the gradient can be
calculated with a single back pass of the adjoint model, rather than many
forward passes of the regular forward model. |