Problem: computing the gradient is expensive in forward mode. Also, it takes a lot of space to save this vector at many time steps. The beauty of the variational approach is that the gradient can be calculated with a single back pass of the adjoint model, rather than many forward passes of the regular forward model.