Logistic Regression and Gradient Descent Optimization

The Journey of Regularization in ML Using Logistic Regression · Part 2 of 6

The goal is to learn the optimal weights and bias so that the predicted probabilities are as close as possible to the true labels.

1. Training Logistic Regression

In logistic regression, the goal is to learn the optimal weights and bias so that the predicted probabilities are as close as possible to the true labels.

This is achieved by minimizing the Binary Cross-Entropy loss function using an optimization algorithm such as gradient descent.

Training involves three main steps:

Compute predicted probabilities using the sigmoid function.
Compute the loss using the cross-entropy formula.
Update the model parameters (weights and bias) to reduce the loss.

2. Gradient Descent Optimization

Gradient descent is an iterative optimization algorithm used to minimize the loss function. At each iteration, the parameters are updated in the direction that reduces the loss.

Parameter Update Rules

This formula represents the Gradient Descent update rule, which is the "engine" that allows machine learning models to learn by iteratively adjusting their weights.

Weights:

$w_j = w_j - \alpha \frac{\partial L}{\partial w_j}$

Bias:

$b = b - \alpha \frac{\partial L}{\partial b}$

key point: The gradients measure how much the loss changes when the parameters change.

Symbol	Meaning
( $w_j$ )	weight of feature (j)
(b)	bias
(L)	loss function
(\alpha)	learning rate

3. Gradient of Logistic Regression

For logistic regression, the gradients are:

Weight Gradient

$\frac{\partial L}{\partial w_j} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) x_{ij}$

Bias Gradient

$\frac{\partial L}{\partial b} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)$

Symbol	Meaning
( $y_i$ )	true label
( $\hat{y}_i$ )	predicted probability
( $x_{ij}$ )	feature value
(n)	number of samples

4. Applying Gradient Computation to Our Example

Sample	True Label (y)	Predicted Probability (p)
1	1	0.9
2	0	0.2

Step 1: Compute Prediction Error

To render the error (or residual), which represents the difference between your model's prediction and the actual target, use the following equation:

$\text{error} = \hat{y} - y$

Sample	(y)	(p)	Error
1	1	0.9	-0.1
2	0	0.2	0.2

Step 2: Compute Bias Gradient

$\begin{aligned}\frac{\partial L}{\partial b} = \frac{1}{n} \sum (\hat{y} - y)\\= \frac{-0.1 + 0.2}{2}\\= \frac{0.1}{2}\\= 0.05\end{aligned}$

So

$\frac{\partial L}{\partial b} = 0.05$

5. Parameter Update Example

Assume

Initial bias b = 0

Learning rate $\alpha = 0.1$

Update rule

$b_{new} = b - \alpha \frac{\partial L}{\partial b}$

Substitute values

$\begin{aligned} b_{new}= 0 - 0.1(0.05)\\b_{new}= -0.005 \end{aligned}$

Thus the bias decreases slightly to reduce the loss.

6. Summary Iterative Learning Process

The goal of training is to find parameters and that minimize the average log loss:

$L = -\frac{1}{n} \sum [y \log(\hat{y}) + (1 - y) \log(1 - \hat{y})]$

Training logistic regression involves repeating the following steps:

$\begin{aligned}\text{1. Compute linear combination:}\quad z = w^T x + b\\\text{2. Apply sigmoid function:} \quad \hat{y} = \sigma(z) \\ \text{3. Compute loss:} \quad L = -(y \log(\hat{y}) + (1 - y) \log(1 - \hat{y})) \\ \text{4. Compute gradients:} \quad \frac{\partial L}{\partial w}, \quad \frac{\partial L}{\partial b} \\ \text{5. Update parameters:} \quad w = w - \alpha \nabla_w, \quad b = b - \alpha \nabla_b \end{aligned}$

The mathematics behind your logistic regression program, step by step

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

# Sample dataset
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([0, 0, 0, 1, 1])

# Model
model = LogisticRegression()
model.fit(X, y)

# Predictions for curve
X_test = np.linspace(0, 6, 100).reshape(-1, 1)
y_prob = model.predict_proba(X_test)[:, 1]

# Decision boundary (where probability = 0.5)
decision_boundary = X_test[np.argmin(np.abs(y_prob - 0.5))][0]

# Plot
plt.figure(figsize=(8,5))
plt.scatter(X, y, color='blue', label='Data points')
plt.plot(X_test, y_prob, color='red', label='Sigmoid curve')
plt.axvline(x=decision_boundary, color='green', linestyle='--', label=f'Decision Boundary ≈ {decision_boundary:.2f}')
plt.xlabel("Feature")
plt.ylabel("Probability / Class")
plt.title("Logistic Regression Curve with Decision Boundary")
plt.legend()
plt.show()

The Dataset:

X = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{bmatrix}, \quad y = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}

X = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{bmatrix}, \quad y = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}

X = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{bmatrix}, \quad y = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}

X = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{bmatrix}, \quad y = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}

X = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{bmatrix}, \quad y = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}

X	z = w*X+b	sigmoid(z) = 1 / (1+e^-z)
1	2.197*1 -6.593 = -4.396	σ(-4.396) ≈ 0.012
2	2.197*2 -6.593 = -2.199	σ(-2.199) ≈ 0.100
3	2.197*3 -6.593 = -0.002	σ(-0.002) ≈ 0.500
4	2.197*4 -6.593 = 2.195	σ(2.195) ≈ 0.899
5	2.197*5 -6.593 = 4.392	σ(4.392) ≈ 0.988

X	y	ŷ	ŷ - y
1	0	0.012	0.012
2	0	0.100	0.100
3	0	0.500	0.500
4	1	0.899	-0.101
5	1	0.988	-0.012

X	ŷ - y	(ŷ - y)*X
1	0.012	0.012
2	0.100	0.200
3	0.500	1.500
4	-0.101	-0.404
5	-0.012	-0.060

Complete Python Code to Understand the topic

← Logistic Regression and Loss Function (B... Overfitting and Underfitting in Machine... →