Logistic Regression and Loss Function (Binary Cross-Entropy)

The Journey of Regularization in ML Using Logistic Regression · Part 1 of 6

Logistic Regression is a fundamental classification algorithm in Machine Learning. It predicts the probability that a sample belongs to a particular class, making it suitable for binary classification problems.

1. Introduction

Logistic Regression is a classification algorithm used to predict the probability that a sample belongs to a class. Unlike linear regression, which predicts continuous values, logistic regression predicts probabilities between 0 and 1, making it suitable for binary classification problems.

Applications

Email spam detection
Disease diagnosis (e.g., diabetes prediction)
Student pass/fail prediction
Customer churn prediction

2. Linear vs Logistic Regression

Feature	Linear Regression	Logistic Regression
Output	Any real number	Probability (0–1)
Problem Type	Regression	Classification
Loss Function	MSE	Binary Cross-Entropy
Activation	None	Sigmoid

3. Sigmoid Function in Logistic Regression

The sigmoid function is the mathematical function used in logistic regression to convert any real-valued number into a probability between 0 and 1 (see Figure 1).

Figure 1. Sigmoid function

Converts the linear combination of features into a probability.
Output ranges between 0 (class 0) and 1 (class 1).

The sigmoid function is defined as:

 $\sigma(z) = \frac{1}{1 + e^{-z}}$

Alternatively,

$a = \frac{1}{1 + e^{-(w_1 x_1 + \dots + w_n x_n + b)}}$

where:

 $z = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b$

or

$z = \sum_{i=1}^{n} w_i x_i + b$

or in vector form

$z = \mathbf{w}^\top \mathbf{x} + b$

Components:

 $x_i$  = input features
 $w_i$  = weights
 $b$  = bias
 $z$  = linear combination
 $e$  = Euler’s constant

4. Logistic Regression Model

The logistic regression model first computes a linear combination of features, then applies the sigmoid function.

Linear Model:

Probability Model

Figure 2. Logistic Function

Exponent: The exponent $e^{-z}$ ensures that no matter how large or small the value of becomes, the output remains a valid probability between 0 and 1.

Decision Rule

The predicted class is determined using a threshold of 0.5.

$\hat{y}=\begin{cases}1 if P(y=1 \mid x) \geq 0.5 \\0 if P(y=1 \mid x) \ll 0.5\end{cases}$

Decision Boundary

The decision boundary occurs when the predicted probability equals 0.5.

Since, Sigmoid function is: $P(y=1) = \frac{1}{1 + e^{-z}}$

Substitute in sigmoid: $P(y=1 \mid x) = 0.5$

$0.5 = \frac{1}{1 + e^{-z}}$

Step	Rendered Result
1. Reciprocal	$1 + e^{-z} = \frac{1}{0.5}$
2. Simplify	$1 + e^{-z} = 2$
3. Isolate $e$	$e^{-z} = 1$
4. Natural Log	$-z = \ln(1)$
5. Final	$z = 0$

this implies: $z = 0$

Therefore the decision boundary condition is: $w_1x_1 + b = 0$

$w_1x_1 + b = 0$

x_1 = -\frac{b}{w_1}

For:

1 feature → point, e.g., $w_1x_1 + b = 0$
2 features → line e.g., $w_1x_1 + w_2x_2 + b = 0$ $x_2 = -\frac{w_1}{w_2}x_1 - \frac{b}{w_2}$
3 features → plane
n features → hyperplane

5. Loss Function (Binary Cross-Entropy)

The Binary Cross-Entropy Loss (also known as Log Loss) is the standard cost function used to train logistic regression.

$L = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$

Measures prediction error
Smaller loss → better model
Works well with probabilities instead of hard labels

Example Table:

Sample	Actual Label (y)	Predicted P ( y = 1)	Loss
1	1	0.9	0.105
2	0	0.2	0.223

For a single sample:

$L = -(y \log(p) + (1 - y) \log(1 - p))$

Where

y= true label
p = predicted probability P(y=1)

Actual Label	Predicted Label
y=1	p = 0.9

Calculation for Sample 1

Step 1: Substitute values

$L = -\left(1 \cdot \log(0.9) + (1-1) \cdot \log(1-0.9)\right)$

Step 2: Simplify

$L = -(\log(0.9) + 0)$

Step 3: Compute log

$\log(0.9) \approx -0.105$

Step 4: Final loss

$L = -(-0.105) = 0.105$
Loss = 0.105

Calculation for Sample 2

Step 1: Substitute values

L = -\left(0 \cdot \log(0.2) + (1-0) \cdot \log(1-0.2)\right)

Step 2: Simplify

L = -(\log(0.8))

Step 3: Compute log

\log(0.8) \approx -0.223

Step 4: Final loss

L = -(-0.223) = 0.223

Loss = 0.223

Sample	True Label (y)	Predicted (p)	Loss (L)
1	1	0.9	0.105
2	0	0.2	0.223

4. Average Loss for the Dataset

If we average the losses from both samples:

L_{avg} = \frac{0.105 + 0.223}{2}

L_{avg} = \frac{0.328}{2}

L_{avg} = 0.164

5. Interpretation

Log Loss penalizes the model based on how far the predicted probability is from the true label.

Example Analysis:

Sample 1:

$p=0.9$ for $y=1$

This is a strong and correct prediction, producing a small loss (0.105).

Sample 2:

$p=0.2$ for $y=0$

This is also a correct prediction, but with slightly less confidence, producing a larger loss (0.223).

Key Note: During training, the goal is to minimize the average loss across all samples, pushing predicted probabilities closer to the true labels..

X = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{bmatrix}, \quad y = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}

Logistic Regression and Gradient Descent... →

Linked to

Machine Learning (Course folder)
Logistic Regression (Material)

By Dr. Adnan Amin · March 7, 2026 · 2,082 views

★ ★ ★ ★ ★ (4.2)

1 Comment

Sign in to leave a comment.

M

Muhammad Shaheer Siddiqui 4 months ago

I learned that logistic regression is used for binary classification and predicts probabilities between 0 and 1 using the sigmoid function. I also understood that Binary Cross-Entropy loss measures the error between predicted probabilities and actual labels to improve the model