Understand Least Square Solution

We often encounter the concept of the least squares method in courses like 'Introduction to Statistics,' usually taught in late high school or at the entry level in college. Most of us remember that the least squares method allows us to find the best-fit line through a set of data points that exhibit some linear features or shapes in a 2D XY coordinate system.

To deeply grasp the essence of the least squares solution, it's crucial to understand that this method provides an approximation of the underlying pattern. This implies that there is no exact solution; the fitted line may not pass through all the data points, but it comes close enough.

To further comprehend the method for approximating the best-fit line, one needs to understand the concepts of the objective function and optimization. Simply put, the objective function aims to find a parameter that either minimizes or maximizes the function, a process commonly referred to as optimization.

The cost, error, or empirical risk function quantifies the residual error between the model's predictions (i.e, the term A*X) and the actual observations (i.e, the term B). In the context of linear regression, a direct approach exists to minimize this error through matrix inversions.

Least Squares Solution to solve liner system AX=B:

X'=(ATA)−1ATB, 

where X' is the approximated solution (not the exact solution which doesn't exist for an over-constrained system (i.e. tall matrix A)

In applying the least squares solution to a linear regression model, the least squares method becomes the error objective function that we aim to minimize. The solution to this least squares objective function—known as the least squares solution—can be found either directly by inverting the matrix or through iterative approaches like gradient descent. However, the core idea remains the same: to find a solution that minimizes this error function, fulfilling our objective.