The Journey of Regularization in ML Using Logistic Regression · Part 3 of 6

Our goal is not just to perform well on the training data, but also to perform well on new unseen data.

1. Introduction

When we train a machine learning model, our goal is not just to perform well on the training data, but also to perform well on new unseen data.

A good model should generalize well.

However, two common problems occur during training:

Underfitting
Overfitting

Understanding these concepts is fundamental in Machine Learning because they determine whether a model can learn meaningful patterns from data.

2. Generalization

A model learns from training data and then makes predictions on new data. If the model performs well on both datasets, it has good generalization.

3. Underfitting

Underfitting occurs when a model is too simple to capture the underlying structure of the data. The model fails to learn the relationship between input variables and output variables.

Characteristics of the underfitting models show the following:

High training error
High testing error
Poor predictions
Inability to capture patterns

Example

Suppose we have nonlinear data. If we apply a simple linear model, it may not capture the true relationship.

For example:

Actual relationship: $y = x^2$

But the model tries to learn: $y = ax + b$

Underfitting

The model cannot represent the curved relationship.

Causes of Underfitting

Underfitting usually happens because:

Model is too simple
Not enough training time
Important features are missing
Excessive regularization

How to Fix Underfitting

Possible solutions:

Use a more complex model
Add more features
Reduce regularization
Train longer

4. Overfitting

Overfitting occurs when a model learns the training data too well, including noise and random fluctuations. Instead of learning general patterns, it memorizes the training dataset.

Characteristics overfitting models show:

Very low training error
High testing error
Poor generalization

Overfitting

The model can perfectly match the training data. However, predictions on new data become unstable.

Causes of Overfitting

Overfitting occurs when:

Model is too complex
Dataset is small
Too many features
Training too long
No regularization

5. Good Model Fit

he ideal situation lies between underfitting and overfitting. A good model:

Captures the real pattern
Ignores noise
Performs well on new data

Intuitive Explanation

Think of a student preparing for an exam.

Two scenarios:

Good learning

The student understands concepts and can solve new problems.

Overfitting

The student memorises exact answers from past papers. When the exam changes slightly, the student fails. Similarly, an overfitted model memorises training data instead of learning patterns.

Bias, Variance, Good fit

← Logistic Regression and Gradient Descent... Prediction Error Decomposition (Bias–Var... →

Linked to

Machine Learning (Course folder)
Advance Topics in Data Mining (Course folder)
Linear Regression Model and Optimization (Material)
Logistic Regression (Material)
Introduction to Machine Learning (Material)

Overfitting and Underfitting in Machine Learning