TODO
$$ \Delta \theta_{t+1} = \mu \Delta \theta_{t} + \alpha \nabla f(\theta_t) $$
$$ \theta_{t+1} = \theta_{t} + \Delta \theta_{t} $$
with
TODO [Sut13]:
$$ \Delta \theta_{t+1} = \mu \Delta \theta_{t} + \alpha \nabla f(\theta_t + \mu \Delta \theta_t) $$
$$ \theta_{t+1} = \theta_{t} + \Delta \theta_{t} $$
TODO
def get_updates(params, learning_rate, momentum, nesterov, consider_constant)
# momentum and nesterov momentum
updates = []
for param in params:
param_update = theano.shared(param.get_value()*np.cast[theano.config.floatX](0.))
updates.append((param, param + param_update))
if nesterov: # nesterov momentum
eval_param = param + momentum * param_update
else: # classical momentum
eval_param = param
updates.append((param_update, momentum*param_update - learning_rate * T.grad(cost, eval_param, consider_constant=consider_constant)))
return updates
TODO
Literature: