1. Introduction

Logistic Regression is a classification algorithm used to predict the probability that a sample belongs to a class. Unlike linear regression, which predicts continuous values, logistic regression predicts probabilities between 0 and 1, making it suitable for binary classification problems.

Applications

  • Email spam detection
  • Disease diagnosis (e.g., diabetes prediction)
  • Student pass/fail prediction
  • Customer churn prediction

2. Linear vs Logistic Regression

Feature Linear Regression Logistic Regression
Output Any real number Probability (0–1)
Problem Type Regression Classification
Loss Function MSE Binary Cross-Entropy
Activation None Sigmoid

3. Sigmoid Function in Logistic Regression

The sigmoid function is the mathematical function used in logistic regression to convert any real-valued number into a probability between 0 and 1 (see Figure 1).

Figure 1. Sigmoid function

  • Converts the linear combination of features into a probability.
  • Output ranges between 0 (class 0) and 1 (class 1).

The sigmoid function is defined as:



Alternatively,



where:

or

or in vector form

Components:

 = input features
= weights
= bias
= linear combination
= Euler’s constant

4. Logistic Regression Model

The logistic regression model first computes a linear combination of features, then applies the sigmoid function.

Linear Model:

Probability Model

Figure 2. Logistic Function

Exponent: The exponent   ensures that no matter how large or small the value of z becomes, the output remains a valid probability between 0 and 1.

Decision Rule

The predicted class is determined using a threshold of 0.5.

Decision Boundary

The decision boundary occurs when the predicted probability equals 0.5.

Since, Sigmoid function is:

Substitute in sigmoid:  

Step Rendered Result
1. Reciprocal
2. Simplify
3. Isolate
4. Natural Log
5. Final

this implies: 

Therefore the decision boundary condition is:

Solving for x:

For:

  • 1 feature → point, e.g.,   for x:
  • 2 features → line  e.g.,  ,  for x2:  
  • 3 features → plane
  • n features → hyperplane

 

5. Loss Function (Binary Cross-Entropy)

The Binary Cross-Entropy Loss (also known as Log Loss) is the standard cost function used to train logistic regression.

  • Measures prediction error
  • Smaller loss → better model
  • Works well with probabilities instead of hard labels

Example Table:

Sample Actual Label (y) Predicted P ( y = 1) Loss
1 1 0.9 0.105
2 0 0.2 0.223

 

For a single sample:

 

Where

  • y= true label
  • p = predicted probability P(y=1)
Actual Label Predicted Label
y=1 p = 0.9

Calculation for Sample 1

Step 1: Substitute values

Step 2: Simplify

Step 3: Compute log

Step 4: Final loss


Loss = 0.105

Calculation for Sample 2

Step 1: Substitute values

Step 2: Simplify

Step 3: Compute log

Step 4: Final loss

Loss = 0.223

Sample True Label (y) Predicted (p) Loss (L)
1 1 0.9 0.105
2 0 0.2 0.223

4. Average Loss for the Dataset

If we average the losses from both samples:

5. Interpretation

Log Loss penalizes the model based on how far the predicted probability is from the true label.

Example Analysis:

Sample 1:

for

This is a strong and correct prediction, producing a small loss (0.105).

Sample 2:

for

This is also a correct prediction, but with slightly less confidence, producing a larger loss (0.223).

Key Note: During training, the goal is to minimize the average loss across all samples, pushing predicted probabilities closer to the true labels..

 

 
  1. Python Code to Understand One Feature Logistic Regression
  2. Python Code for Two Featurs Logistic Regression