Before getting started, we need to revise some of the basic statements regarding neural networks.Neural networks are nothing but, when presented a set of input and output pair of values , it learns the function which maps from input to output.The learned function may be the better approximation to the true underlying function or not.Sounds weird right??. Lets decompose it , so that we could gain intuition. In my previous blog post , we were discussing about a problem of finding the missing value. So , to solve the problem we found the pattern between the data points i.e.,( Y= 1 * X ). Using this pattern we predicted the output of Y to be 6 given X is 6. The task was easier because it is easier to spot out the relationship. But what if the relationship was complex?.To solve this , we use Neural Networks.So the idea behind solving the problem is that, finding the correct weight which maps from X->Y.In the above problem , the weight is 1. So,if there were some algorithm that could find out the weights so that the X is correctly mapped to Y,then we could solve the problem.The algorithm we are going to discuss here is Gradient Descent using Backpropagation. Before moving onto the discussion of the algorithm, we need to understand some of the topics in calculus. Don’t worry,I will present you with greater intuition.
Basics Of Calculus
In calculus , we are going to discuss the topic of differentiation.What does a derivative of a function tells us? How are we going to apply this technique in gradient descent using backpropagation? . To answer the above questions , we first need to understand what does the derivative of a function represents.
Let us consider a simple function y = 2 * x . For every value of x , y is defined as 2 times the x .For example, if x were 2 then y would be two times the x which is 4. Derivative of the function tells us how the output y will change if we were to increase x by a small amount.To understand the statement we could use our friend Mr.Graph.
Let us plot the above function in graph.

The above function is also linear since we got a straight line after connecting those points. So , if we differentiate the function y = 2 * x , we would get 2 as the output.Why? Let’s find it. In the above graph the value of y given x as 1 is 2 (y = 2 * x) and for y given x as 2 is 4 (y = 2 *x ).Derivative of a function represents how much the output is changed with respect to the input.Lets understand this visually.Consider the graph given below.

We could observe from the above graph that, for every unit(1) increase in x the value of y is increasing by 2.Also, the derivative of a function is nothing but the slope of a line(blue line). You could plug in different values of x and y and justify yourselves. For a practice try finding the derivative of the function y = 3 * x in the similar way.So,how this concept would help us in finding the optimal weights for our network.If you closely watch the value we get after calculating the derivative, is nothing but the true weight(2) which maps from x to y (y = 2 * x).With this detail ,we will drive towards gradient descent using backpropagation.