Polynomial Regression with Gradient Descent

This notebook performs cubic polynomial regression using gradient descent to approximate a sine function

y(x)=sin(πx)y(x) = \sin(\pi x)

using a cubic polynomial model:

y^(x)=a+bx+cx2+dx3\hat{y}(x) = a + bx + cx^2 + dx^3

where a,b,c,dRa, b, c, d \in \mathbb{R} are the parameters to be learned. The goal is to minimize the mean squared error between the model's predictions and the true sine function. The model is trained using gradient descent, which iteratively updates the parameters in the direction of the negative gradient of the loss function.

This kind of work is typically done in Python with libraries like TensorFlow or PyTorch, but here we implement it in JavaScript as an educational exercise.


1. Dataset Construction

In the code:

const x = Array.from({ length: 1000 }, (_, i) => (i / 999) * 2 - 1);
const y_true = x.map(value => Math.sin(value * Math.PI));

We define the input data xi[1,1]x_i \in [-1, 1], evenly spaced over 1000 points. The target output is generated by:

yitrue=sin(πxi)y_i^{\text{true}} = \sin(\pi x_i)

This is the function we wish to approximate.


2. Model Definition: Cubic Polynomial

We define a model function f(x)f(x) as a cubic polynomial:

y^i=f(xi)=a+bxi+cxi2+dxi3\hat{y}_i = f(x_i) = a + bx_i + cx_i^2 + dx_i^3

Where a,b,c,dRa, b, c, d \in \mathbb{R} are the parameters of the model to be learned via gradient descent.


3. Loss Function: Mean Squared Error

We define the loss function LL as the mean squared error between the model's prediction y^i\hat{y}_i and the true output yitruey_i^{\text{true}}:

L(a,b,c,d)=1ni=1n(y^iyitrue)2L(a, b, c, d) = \frac{1}{n} \sum_{i=1}^n \left( \hat{y}_i - y_i^{\text{true}} \right)^2

Expanding the prediction:

L(a,b,c,d)=1ni=1n(a+bxi+cxi2+dxi3sin(πxi))2L(a, b, c, d) = \frac{1}{n} \sum_{i=1}^n \left( a + bx_i + cx_i^2 + dx_i^3 - \sin(\pi x_i) \right)^2

This measures the total squared deviation between the polynomial model and the sine curve over all inputs.


4. Gradient Descent: Minimizing the Loss

Recap of the Chain Rule

Let:

y=f(g(x))y = f(g(x))

Then the derivative of yy with respect to xx is:

dydx=dfdgdgdx\frac{dy}{dx} = \frac{df}{dg} \cdot \frac{dg}{dx}

That is, differentiate the outer function ff evaluated at g(x)g(x), then multiply by the derivative of the inner function gg.

To minimize the loss, we compute the partial derivatives of LL with respect to each parameter:

Let the error term be:

ei=y^iyitrue=a+bxi+cxi2+dxi3sin(πxi)e_i = \hat{y}_i - y_i^{\text{true}} = a + bx_i + cx_i^2 + dx_i^3 - \sin(\pi x_i)

Then, using the chain rule, the gradients are:

La=1ni=1nei\frac{\partial L}{\partial a} = \frac{1}{n} \sum_{i=1}^n e_iLb=1ni=1neixi\frac{\partial L}{\partial b} = \frac{1}{n} \sum_{i=1}^n e_i \cdot x_iLc=1ni=1neixi2\frac{\partial L}{\partial c} = \frac{1}{n} \sum_{i=1}^n e_i \cdot x_i^2Ld=1ni=1neixi3\frac{\partial L}{\partial d} = \frac{1}{n} \sum_{i=1}^n e_i \cdot x_i^3

These are implemented in the code as:

const a_grad = error.reduce((acc, e) => acc + e, 0) / n;
const b_grad = error.reduce((acc, e, i) => acc + e * x[i], 0) / n;
const c_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 2, 0) / n;
const d_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 3, 0) / n;

We then update the parameters using gradient descent:

θθηLθ\theta \leftarrow \theta - \eta \cdot \frac{\partial L}{\partial \theta}

for θ{a,b,c,d}\theta \in \{a, b, c, d\}, and where η=0.1\eta = 0.1 is the learning rate.

This update is repeated for 2000 iterations:

for (let t = 0; t < 2000; t++) { ... }

5. Final Prediction and Plotting

After training, we compute the final model output:

const y_pred = x.map(xi => a + b * xi + c * xi ** 2 + d * xi ** 3);

This gives us:

y^i=a+bxi+cxi2+dxi3\hat{y}_i = a + bx_i + cx_i^2 + dx_i^3

Finally, both yitrue=sin(πxi)y_i^{\text{true}} = \sin(\pi x_i) and y^i\hat{y}_i are plotted:

Plot.line(plotData, { x: 'x', y: 'y', stroke: 'line' })

The resulting graph shows:

This visually demonstrates how well the cubic model fits the sine curve using gradient-based optimization.

Plot of predicted vs true values

Entire code:

const x = Array.from({ length: 1000 }, (_, i) => (i / 999) * 2 - 1);
const lr = 0.1;
const y_true = x.map(value => Math.sin(value * Math.PI)); // stretch it a bit

let a = Math.random();
let b = Math.random();
let c = Math.random();
let d = Math.random();

for (let t = 0; t < 2000; t++) {
  const y_pred = x.map(xi => a + b * xi + c * xi ** 2 + d * xi ** 3 );
  const error = y_pred.map((yp, i) => yp - y_true[i]);

  const n = x.length;
  const a_grad = error.reduce((acc, e) => acc + e, 0) / n;
  const b_grad = error.reduce((acc, e, i) => acc + e * x[i], 0) / n;
  const c_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 2, 0) / n;
  const d_grad = error.reduce((acc, e, i) => acc + e * x[i] ** 3, 0) / n;

  a -= lr * a_grad;
  b -= lr * b_grad;
  c -= lr * c_grad;
  d -= lr * d_grad;
}

const y_pred = x.map(xi => a + b * xi + c * xi ** 2 + d * xi ** 3);