Unraveling Univariate Regression: Understanding Least Squares and Inference

Subin Alex
4 min readAug 18, 2024

--

Introduction

Univariate regression is one of the fundamental building blocks in machine learning. It’s all about predicting a single output y from an input x. However, to truly grasp its magic, we need to understand the mathematical machinery that drives it, and more importantly, how different concepts depend on each other to give us a clear picture.

In this article, I will guide you through the process of univariate regression, breaking down key concepts like least squares loss, inference, and the role of normal distributions, step by step.
1. What Is Univariate Regression?

At its core, univariate regression aims to predict a single value, y, from an input, x. However, instead of directly trying to guess y, we use a model represented by f(x, φ), where φ is a set of parameters that the model adjusts to make its predictions as close as possible to the actual values.

Here’s how it works:

Goal: Predict the value of y using input x by training a model.
Model: The model f(x, φ) predicts what we call the mean (μ) of a normal distribution over possible values for y.

But why a normal distribution? Let’s dig deeper.
2. The Role of Normal Distribution

A key idea in univariate regression is assuming that the output y is random and follows a normal distribution. A normal distribution is characterized by two things:

The mean (μ): This is where most of the values are expected to center.
The variance (σ²): This tells us how spread out the values are around the mean.

In univariate regression, the model f(x, φ) is responsible for predicting the mean (μ) of this distribution. The variance σ² is often assumed to be constant and does not depend on x.

This gives us a probabilistic framework where instead of predicting a single output, the model predicts a distribution of possible values for y, with μ = f(x, φ).
3. Least Squares Loss Function

Now that we have our model predicting the mean (μ) of the distribution, we need a way to measure how well the model is doing. This is where the loss function comes into play.

The starting point is the log-likelihood function, which measures how “likely” the actual observed values y are, given the model’s predictions. Mathematically, the log-likelihood is:

scss

L(φ) = — Σ log(Pr(yᵢ | f(xᵢ, φ), σ²))

However, we can simplify this expression step by step. First, by applying the logarithm, we can split the function into a constant term and a term involving the prediction error:

scss

L(φ) = Σ ((yᵢ — f(xᵢ, φ))²) / (2σ²)

Here’s the magic: Since the constant (1 / 2σ²) does not affect the optimization (it doesn’t depend on φ), we drop it, leaving us with the least squares loss function:

scss

L(φ) = Σ (yᵢ — f(xᵢ, φ))²

This is the formula most people know from linear regression. It simply measures the difference between the actual value yᵢ and the predicted value f(xᵢ, φ), and squares that difference to ensure all errors are positive.
4. The Inference Process: Finding the Best Prediction

Once we have the least squares loss function, the next step is inference. Inference is the process of making a prediction based on the model after it has been trained. But instead of directly predicting y, the model gives us a distribution over possible values of y.

In this case, the most likely value of y is the mean (μ) of the normal distribution. To formalize this, we use the maximum likelihood estimate:

scss

ŷ = argmax Pr(y | f(x, φ̂))

The symbol argmax means “find the value of y” that maximizes the probability. Since we are working with a normal distribution, we know that the maximum occurs at the mean (μ). Therefore, the best prediction ŷ is simply the model’s predicted mean:

scss

ŷ = f(x, φ̂)

This tells us that the most likely estimate for y, after the model has been trained, is exactly the mean predicted by the model.
5. Connecting the Dots: How Everything Fits Together

Let’s summarize the sequence of concepts and their dependencies:

Univariate regression aims to predict y from x, but it predicts the mean (μ) of a normal distribution over possible values of y.
The normal distribution is characterized by a mean μ and a variance σ². The model predicts the mean, while σ² is often assumed constant.
To train the model, we use the least squares loss function, which measures the difference between the predicted mean f(x, φ) and the actual values y.
When we make predictions, or perform inference, we take the most likely estimate, which is the mean of the predicted normal distribution.

By breaking down the process into these steps, we can see how the normal distribution, least squares loss function, and inference are all interconnected. The model predicts the mean, and our job is to adjust the parameters φ to minimize the errors between the predicted means and the actual values.

Conclusion

Univariate regression may sound simple at first, but its power lies in its ability to predict not just single values but probability distributions over possible outcomes. By understanding how each concept depends on the others — normal distributions, least squares, and inference — you can gain a much clearer picture of how these equations work and why they’re so effective.

The next time you see the familiar formula Σ (yᵢ — f(xᵢ, φ))², remember that it’s not just a formula for errors, but a result of probabilistic assumptions about how data behaves. And at its heart is the idea that our model is predicting the mean of a distribution, not just an outcome.

--

--

No responses yet