Linear Regression with Gradient Descent

Polynomial Regression with Gradient Descent

This notebook performs cubic polynomial regression using gradient descent to approximate a sine function

y(x) = \sin(\pi x)

using a cubic polynomial model:

\hat{y}(x) = a + bx + cx^2 + dx^3

where $a, b, c, d \in \mathbb{R}$ are the parameters to be learned. The goal is to minimize the mean squared error between the model's predictions and the true sine function. The model is trained using gradient descent, which iteratively updates the parameters in the direction of the negative gradient of the loss function.

This kind of work is typically done in Python with libraries like TensorFlow or PyTorch, but here we implement it in JavaScript as an educational exercise.

1. Dataset Construction

In the code:

const x = Array.from({ length: 1000 }, (_, i) => (i / 999) * 2 - 1);
const y_true = x.map(value => Math.sin(value * Math.PI));

We define the input data $x_i \in [-1, 1]$ , evenly spaced over 1000 points. The target output is generated by:

y_i^{\text{true}} = \sin(\pi x_i)

This is the function we wish to approximate.

2. Model Definition: Cubic Polynomial

We define a model function $f(x)$ as a cubic polynomial:

\hat{y}_i = f(x_i) = a + bx_i + cx_i^2 + dx_i^3

Where $a, b, c, d \in \mathbb{R}$ are the parameters of the model to be learned via gradient descent.

3. Loss Function: Mean Squared Error

We define the loss function $L$ as the mean squared error between the model's prediction $\hat{y}_i$ and the true output $y_i^{\text{true}}$ :

L(a, b, c, d) = \frac{1}{n} \sum_{i=1}^n \left( \hat{y}_i - y_i^{\text{true}} \right)^2

Expanding the prediction:

L(a, b, c, d) = \frac{1}{n} \sum_{i=1}^n \left( a + bx_i + cx_i^2 + dx_i^3 - \sin(\pi x_i) \right)^2

This measures the total squared deviation between the polynomial model and the sine curve over all inputs.

4. Gradient Descent: Minimizing the Loss

Recap of the Chain Rule

Let:

y = f(g(x))

Then the derivative of $y$ with respect to $x$ is:

\frac{dy}{dx} = \frac{df}{dg} \cdot \frac{dg}{dx}

That is, differentiate the outer function $f$ evaluated at $g(x)$ , then multiply by the derivative of the inner function $g$ .

To minimize the loss, we compute the partial derivatives of $L$ with respect to each parameter:

Let the error term be:

e_i = \hat{y}_i - y_i^{\text{true}} = a + bx_i + cx_i^2 + dx_i^3 - \sin(\pi x_i)

Then, using the chain rule, the gradients are:

\frac{\partial L}{\partial a} = \frac{1}{n} \sum_{i=1}^n e_i

\frac{\partial L}{\partial b} = \frac{1}{n} \sum_{i=1}^n e_i \cdot x_i

\frac{\partial L}{\partial c} = \frac{1}{n} \sum_{i=1}^n e_i \cdot x_i^2

\frac{\partial L}{\partial d} = \frac{1}{n} \sum_{i=1}^n e_i \cdot x_i^3

These are implemented in the code as:

const a_grad = error.reduce((acc, e) => acc + e, 0) / n;
const b_grad = error.reduce((acc, e, i) => acc + e * x[i], 0) / n;
const c_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 2, 0) / n;
const d_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 3, 0) / n;

We then update the parameters using gradient descent:

\theta \leftarrow \theta - \eta \cdot \frac{\partial L}{\partial \theta}

for $\theta \in \{a, b, c, d\}$ , and where $\eta = 0.1$ is the learning rate.

This update is repeated for 2000 iterations:

for (let t = 0; t < 2000; t++) { ... }

5. Final Prediction and Plotting

After training, we compute the final model output:

const y_pred = x.map(xi => a + b * xi + c * xi ** 2 + d * xi ** 3);

This gives us:

\hat{y}_i = a + bx_i + cx_i^2 + dx_i^3

Finally, both $y_i^{\text{true}} = \sin(\pi x_i)$ and $\hat{y}_i$ are plotted:

Plot.line(plotData, { x: 'x', y: 'y', stroke: 'line' })

The resulting graph shows:

The true sine function (target),
The learned cubic approximation (prediction).

This visually demonstrates how well the cubic model fits the sine curve using gradient-based optimization.

Plot of predicted vs true values

Entire code:

const x = Array.from({ length: 1000 }, (_, i) => (i / 999) * 2 - 1);
const lr = 0.1;
const y_true = x.map(value => Math.sin(value * Math.PI)); // stretch it a bit

let a = Math.random();
let b = Math.random();
let c = Math.random();
let d = Math.random();

for (let t = 0; t < 2000; t++) {
  const y_pred = x.map(xi => a + b * xi + c * xi ** 2 + d * xi ** 3 );
  const error = y_pred.map((yp, i) => yp - y_true[i]);

  const n = x.length;
  const a_grad = error.reduce((acc, e) => acc + e, 0) / n;
  const b_grad = error.reduce((acc, e, i) => acc + e * x[i], 0) / n;
  const c_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 2, 0) / n;
  const d_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 3, 0) / n;

  a -= lr * a_grad;
  b -= lr * b_grad;
  c -= lr * c_grad;
  d -= lr * d_grad;
}

const y_pred = x.map(xi => a + b * xi + c * xi ** 2 + d * xi ** 3);