**TỔNG HỢP KIẾN THỨC**

*   Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point. \
*   Linear regression is an algorithm that provides a linear relationship between an independent variable and a dependent variable to predict the outcome of future events. It is a statistical method used in data science and machine learning for predictive analysis.

**Gradient Descent in Linear Regression**\
The main aim of gradient descent is to find the best parameters of a model which gives the highest accuracy on training as well as testing datasets.\
**Algorithm** \
t ← 0 \
max_iterations ← 1000 \
w, b ← initialize randomly \
while t < max_iterations do \
    t ← t + 1\
    w_t+1 ← w_t − η ∇w_t\
    b_t+1 ← b_t − η ∇b_t\
end



*   Cost Function: \
We will fit the linear regression parameters theta to our dataset using gradient descent.\
The objective of linear regression is to minimize the cost function\
$$
J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \left( h_\theta \left( x^{(i)} \right) - y^{(i)} \right)^2
$$
where the hypothesis $h_\theta$ is given by the linear \
$$
h_\theta(x) = \theta^T x = \theta_0 + \theta_1 x_1
$$

*   Gradient Descent: \
Recall that the parameters of your model are the theta values. These are the values you will adjust to minimize cost J(theta). One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update. \
$$
\theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)}
$$
simultaneously update $\theta_j$ for all $j$ \
\
With each step of gradient descent, the parameters theta come closer to the optimal values that will achieve the lowest cost J(theta).
*   Convergence: \
Consequently, $-\nabla J(\mathbf{w})$ points in the direction of the steepest descent. \
\
Setting $\mathbf{s} = -\alpha \nabla J(\mathbf{w})$ for a sufficiently small $\alpha > 0$ guarantees to decrease the function:

$$
J(\mathbf{w} + (-\alpha \nabla J(\mathbf{w}))) \approx J(\mathbf{w}) - \alpha \nabla J(\mathbf{w})^T \nabla J(\mathbf{w}) < J(\mathbf{w})
$$

So the iterations of steepest descent are: \
$$
\mathbf{w}^{(i+1)} \leftarrow \mathbf{w}^{(i)} - \alpha \nabla J(\mathbf{w}^{(i)}).
$$


In [None]:
# Example:
# Implementation of Gradient Descent in Linear Regression
import numpy as np
import matplotlib.pyplot as plt


class Linear_Regression:
	def __init__(self, X, Y):
		self.X = X
		self.Y = Y
		self.b = [0, 0]

	def update_coeffs(self, learning_rate):
		Y_pred = self.predict()
		Y = self.Y
		m = len(Y)
		self.b[0] = self.b[0] - (learning_rate * ((1/m) *
												np.sum(Y_pred - Y)))

		self.b[1] = self.b[1] - (learning_rate * ((1/m) *
												np.sum((Y_pred - Y) * self.X)))

	def predict(self, X=[]):
		Y_pred = np.array([])
		if not X:
			X = self.X
		b = self.b
		for x in X:
			Y_pred = np.append(Y_pred, b[0] + (b[1] * x))

		return Y_pred

	def get_current_accuracy(self, Y_pred):
		p, e = Y_pred, self.Y
		n = len(Y_pred)
		return 1-sum(
			[
				abs(p[i]-e[i])/e[i]
				for i in range(n)
				if e[i] != 0]
		)/n
	# def predict(self, b, yi):

	def compute_cost(self, Y_pred):
		m = len(self.Y)
		J = (1 / 2*m) * (np.sum(Y_pred - self.Y)**2)
		return J

	def plot_best_fit(self, Y_pred, fig):
		f = plt.figure(fig)
		plt.scatter(self.X, self.Y, color='b')
		plt.plot(self.X, Y_pred, color='g')
		f.show()


def main():
	X = np.array([i for i in range(11)])
	Y = np.array([2*i for i in range(11)])

	regressor = Linear_Regression(X, Y)

	iterations = 0
	steps = 100
	learning_rate = 0.01
	costs = []

	# original best-fit line
	Y_pred = regressor.predict()
	regressor.plot_best_fit(Y_pred, 'Initial Best Fit Line')

	while 1:
		Y_pred = regressor.predict()
		cost = regressor.compute_cost(Y_pred)
		costs.append(cost)
		regressor.update_coeffs(learning_rate)

		iterations += 1
		if iterations % steps == 0:
			print(iterations, "epochs elapsed")
			print("Current accuracy is :",
				regressor.get_current_accuracy(Y_pred))

			stop = input("Do you want to stop (y/*)??")
			if stop == "y":
				break

	# final best-fit line
	regressor.plot_best_fit(Y_pred, 'Final Best Fit Line')

	# plot to verify cost function decreases
	h = plt.figure('Verification')
	plt.plot(range(iterations), costs, color='b')
	h.show()

	# if user wants to predict using the regressor:
	regressor.predict([i for i in range(10)])


if __name__ == '__main__':
	main()


100 epochs elapsed
Current accuracy is : 0.9836456109008862
