Logistic Regression

Explain Logistic Regression

Logistic regression is a statistical method used to predict a binary outcome, such as "yes" or "no," based on prior observations in a dataset. It models the probability of a certain class or event existing based on independent variables.

A logistic regression model predicts a dependent variable by analyzing the relationship between one or more independent variables.

Logistic regression can be interpreted in terms of geometry, probability, and loss function:

What is Sigmoid Function & Squashing?

The sigmoid function is a mathematical function that can take any real value and map it to a value between 0 and 1, forming an "S"-shaped curve.

The sigmoid function, also called the logistic function, is defined as:
Y = 1 / (1 + exp(-z))

In logistic regression, our optimization problem is to maximize the sum of signed distances. However, this approach is sensitive to outliers. To mitigate this issue, we introduce a concept: if the signed distance is small, we keep it as is; if it is large, we scale it to a smaller value.

To achieve this, we apply the sigmoid function, which converts a large range of signed distances into a limited range of [-1,1]. This process of compressing values into a fixed range is called squashing.

Explain about Optimization Problem in Logistic Regression.

In any classification problem, our goal is to maximize the number of correctly classified points and minimize the number of misclassified points.

For a correctly classified point, the condition holds: yi WT xi > 0

For a misclassified point, the condition holds: yi WT xi < 0

Thus, our optimization problem is to find W that maximizes the sum of yi WT xi.

W* = argmax (∑ yi WT xi)

Mathematical Formulation of Objective Function

For LR, the optimization problem is:

W* = argmax ( ∑ ( yi WT xi ) )

After applying the sigmoid function, the equation transforms into:

W* = argmax ∑ ( 1 / ( 1 + exp( - yi WT xi ) ) )

Now, if we apply a monotonic increasing function such as Logarithm, then it becomes:

W* = argmax ∑ log( 1 / ( 1 + exp( - yi WT xi ) ) )

              ⇒ W* = argmin ∑ log( 1 + exp( - yi WT xi ) )

Let Zi = yi WT xi, then:

W* = argmin ∑ log( 1 + exp( - Zi ) ) for i ∈ (0, n)

The minimum value of the above occurs at Zi → ∞.

If Zi tends to +∞, then the equation approaches 0.

If the selected W correctly classifies all training points, and Zi → ∞, then W is the best W for training data.

However, this leads to overfitting, as it does not guarantee good performance on test data.

The training data may contain outliers that the model has fitted perfectly.

To prevent overfitting, we introduce regularization, modifying the equation as follows:

W* = argmin ∑ log( ( 1 + exp( - yi WT xi ) ) ) + λ WT W

Where λ is a hyperparameter controlling regularization. It is determined using cross-validation:

Explain Importance of Weight Vector in Logistic Regression

Optimization problem:

W* = argmin ∑ log( (1 + exp( - yi WT xi ) ) )

So, the optimal W (W*) is the Weight vector, which is a d-dimensional vector.

Geometric intuition:

The weight vector W is normal to a hyperplane that separates data points into different classes.

For Logistic Regression:

Interpretation of Weight Vectors:

Multi-Collinearity of Features

In Logistic Regression (LR), feature importance is interpreted from weight vectors under the assumption of independence.

However, if there is co-linearity, we cannot interpret feature importance from the weight vector.

Definition:

Impact of Multi-Collinearity:

How to detect Multi-Collinearity?

A multi-collinear feature can be identified by adding noise (perturbation) to the feature values:

Conclusion:

Performing a multi-collinearity test is mandatory to ensure reliable feature importance analysis.

Find Train & Run Time Space and Time Complexity of Logistic Regression

Solving the optimization problem using Stochastic Gradient Descent:

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he’s true? Without losing any information, can you still build a better model?

To check for multicollinearity, you can:

Building a better model without losing information:

What are the basic assumptions to be made for linear regression?

What is the difference between stochastic gradient descent (SGD) and gradient descent (GD)?

When would you use GD over SGD, and vice-versa?

How do you decide whether your linear regression model fits the data?

To assess the goodness of fit of a linear regression model, you can use several statistical methods:

More details: ResearchGate Post

Is it possible to perform logistic regression with Microsoft Excel?

Yes, logistic regression can be performed in Microsoft Excel using tools like:

Tutorial: YouTube Video

When will you use classification over regression?

Classification is used when the target variable is categorical, while regression is used for continuous variables.

More details: Quora Discussion

Why isn't Logistic Regression called Logistic Classification?

Despite being used for classification tasks, logistic regression is still a regression-based approach:

More details: Stats StackExchange

How to Decrease the Test Time Complexity of a Logistic Regression Model?

To reduce the test time complexity of a logistic regression model, we can:

What is the Need for Sigmoid Function in Logistic Regression?

The sigmoid function is used in logistic regression because:

The sigmoid function is mathematically represented as:

σ(z) = 1 / (1 + exp(-z))