Linear Regression
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables.
- Dependent variable: The variable we want to predict (continuous).
- Independent variables: The variables used to make the prediction.
- Model: A linear equation that best fits the data points.
Example: Predicting house prices based on factors like size, location, and number of bedrooms.
Logistic Regression
Logistic regression is a statistical method used to predict the probability of a categorical dependent variable based on one or more independent variables. It is used for classification problems.
- Dependent variable: A categorical variable (e.g., yes/no, true/false).
- Independent variables: The variables used to make the prediction.
- Model: A logistic function that maps input values to probabilities.
Example: Predicting whether an email is spam or not spam.
Key Differences
Feature | Linear Regression | Logistic Regression |
---|---|---|
Output variable | Continuous | Categorical (binary or multi-class) |
Activation function | None | Sigmoid or softmax |
Loss function | Mean squared error | Cross-entropy |
Use cases | Predicting numerical values | Classifying data points |
When to Use Which
- Linear regression: When the target variable is continuous (e.g., predicting house prices).
- Logistic regression: When the target variable is categorical (e.g., spam detection, cancer prediction).
Note: While the name “regression” might be misleading for logistic regression, it’s essentially a classification technique that uses a logistic function to model the probability of belonging to a particular class.
When should I use linear regression?
Linear regression is suitable when the dependent variable is continuous and there’s a linear relationship between the independent and dependent variables. Examples include predicting house prices, sales revenue, or temperature.
When should I use logistic regression?
Logistic regression is suitable when the dependent variable is categorical, such as predicting whether an email is spam or not, whether a customer will churn or not, or whether a patient has a disease or not.
What is the role of the activation function in logistic regression?
The activation function, typically the sigmoid function, maps the linear output of the model to a probability between 0 and 1. This makes it suitable for classification problems.
How do I evaluate the performance of a linear regression model?
Common evaluation metrics for linear regression include:
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
R-squared
How do I evaluate the performance of a logistic regression model?
Common evaluation metrics for logistic regression include:
Accuracy
Precision
Recall
F1-score
ROC curve (Receiver Operating Characteristic curve)
AUC (Area Under the Curve)
Can I use linear regression for classification problems?
While it’s possible to use linear regression for classification by setting a threshold, logistic regression is generally preferred as it directly models the probability of belonging to a particular class.