Logistic regression is a statistical method used to predict a binary outcome, such as "yes" or "no," based on prior observations in a dataset. It models the probability of a certain class or event existing based on independent variables.
A logistic regression model predicts a dependent variable by analyzing the relationship between one or more independent variables.
Logistic regression can be interpreted in terms of geometry, probability, and loss function:
The sigmoid function is a mathematical function that can take any real value and map it to a value between 0 and 1, forming an "S"-shaped curve.
The sigmoid function, also called the logistic function, is defined as:
Y = 1 / (1 + exp(-z))
In logistic regression, our optimization problem is to maximize the sum of signed distances. However, this approach is sensitive to outliers. To mitigate this issue, we introduce a concept: if the signed distance is small, we keep it as is; if it is large, we scale it to a smaller value.
To achieve this, we apply the sigmoid function, which converts a large range of signed distances into a limited range of [-1,1]. This process of compressing values into a fixed range is called squashing.
In any classification problem, our goal is to maximize the number of correctly classified points and minimize the number of misclassified points.
For a correctly classified point, the condition holds: yi WT xi > 0
For a misclassified point, the condition holds: yi WT xi < 0
Thus, our optimization problem is to find W that maximizes the sum of yi WT xi.
W* = argmax (∑ yi WT xi)
For LR, the optimization problem is:
W* = argmax ( ∑ ( yi WT xi ) )
After applying the sigmoid function, the equation transforms into:
W* = argmax ∑ ( 1 / ( 1 + exp( - yi WT xi ) ) )
Now, if we apply a monotonic increasing function such as Logarithm, then it becomes:
W* = argmax ∑ log( 1 / ( 1 + exp( - yi WT xi ) ) )
⇒ W* = argmin ∑ log( 1 + exp( - yi WT xi ) )
Let Zi = yi WT xi, then:
W* = argmin ∑ log( 1 + exp( - Zi ) ) for i ∈ (0, n)
The minimum value of the above occurs at Zi → ∞.
If Zi tends to +∞, then the equation approaches 0.
If the selected W correctly classifies all training points, and Zi → ∞, then W is the best W for training data.
However, this leads to overfitting, as it does not guarantee good performance on test data.
The training data may contain outliers that the model has fitted perfectly.
To prevent overfitting, we introduce regularization, modifying the equation as follows:
W* = argmin ∑ log( ( 1 + exp( - yi WT xi ) ) ) + λ WT W
Where λ is a hyperparameter controlling regularization. It is determined using cross-validation:
Optimization problem:
W* = argmin ∑ log( (1 + exp( - yi WT xi ) ) )
So, the optimal W (W*) is the Weight vector, which is a d-dimensional vector.
Geometric intuition:
The weight vector W is normal to a hyperplane that separates data points into different classes.
For Logistic Regression:
Interpretation of Weight Vectors:
In Logistic Regression (LR), feature importance is interpreted from weight vectors under the assumption of independence.
However, if there is co-linearity, we cannot interpret feature importance from the weight vector.
Definition:
Impact of Multi-Collinearity:
How to detect Multi-Collinearity?
A multi-collinear feature can be identified by adding noise (perturbation) to the feature values:
Conclusion:
Performing a multi-collinearity test is mandatory to ensure reliable feature importance analysis.
Solving the optimization problem using Stochastic Gradient Descent:
To check for multicollinearity, you can:
Building a better model without losing information:
To assess the goodness of fit of a linear regression model, you can use several statistical methods:
More details: ResearchGate Post
Yes, logistic regression can be performed in Microsoft Excel using tools like:
Tutorial: YouTube Video
Classification is used when the target variable is categorical, while regression is used for continuous variables.
More details: Quora Discussion
Despite being used for classification tasks, logistic regression is still a regression-based approach:
More details: Stats StackExchange
To reduce the test time complexity of a logistic regression model, we can:
The sigmoid function is used in logistic regression because:
The sigmoid function is mathematically represented as:
σ(z) = 1 / (1 + exp(-z))