Problem: computing the gradient is expensive in forward mode.  Also, it takes a lot of space to save this vector at many time steps.  The beauty of the variational approach is that the gradient can be calculated with a single back pass of the adjoint model, rather than many forward passes of the regular forward model.