- Optimizers: SGD, RMSProp, Adam, Adagrad, Adamax
- Performance evaluation
- Source code listing

**Optimizers**

As stated above the optimizers are used to increase the accuracy and to reduce the loss in model training. One of the famous algorithms in optimization is gradient descent.

Gradient descent algorithm finds out the minimum coefficients in a function that represents the closest value by moving iteratively to the direction of a negative slope. Step in an iteration is called the learning rate. Batch gradient descent calculates the gradients for the entire training dataset.

In neural networks, the output of a given epoch is compared to the expected values and error is calculated. Based on this error rate weights are updated and network repeats the process again. It is called backpropagation. Several types of optimizers are available to train the neural networks. We'll see some of the mainly used optimizers provided by Keras API.

**SGD**

SGD - Stochastic gradient descent optimizer updates the parameters for each training example. It eliminates the method of computing the entire data in every epoch like batch gradient descent does. We can implement SGD with default values or by setting the parameters.

**RMSProp**

RMSProp (Root Mean Squared Propagation) is a gradient-based optimizer and similar to Adagrad. It applies the exponential moving average of the squared gradients to adjust the learning rate.

**Adam**

Adam (Adaptive Moment Estimation) is a gradient descent-based optimizer combined with the advantages of RMSProp and Adagrad. The method computes the adaptive learning rate for each parameter and applies bias-correction.

**Adagrad**

Adagrad adapts the learning rate with smaller updates according to the gradient value of the independent variable. It is good working with sparse gradients.

**Adamax**

Adamax is a version of Adam and replaces the L² norm-based update to the L^p infinity norm rule.

**Hyperparameters of optimizers**

Some of the key hyperparameters of optimizers are a momentum and learning rate.

**The momentum**method keeps variable updates more consistent to move in the same direction. It helps to increase the learning rate by allowing the weight to incorporate the previous weight updates.**The learning rate**defines the learning rate of the model. It is a learning step.