Machine Learning: Question Set – 09
Give a brief note about logistic regression
When the dependent variable is categorical or binary, logistic regression is the appropriate regression analysis. Logistic regression, like all regression analyses, is a technique for predictive analysis. Logistic regression is a statistical technique for explaining data and the connection between one dependent binary variable and one or more independent variables.
It is also used to forecast the likelihood of a categorical dependent variable.
In the following instances, logistic regression can be used:
- To determine if student will pass (True) or fail (False)
- To determine whether or not a citizen is a Senior Citizen (1) or Not (0)
- To determine whether a person has a sickness (Yes) or not (No)
- To determine will given team win (True) the match or not (False)
Briefly explain various types of logistic regression
Logistic regression is classified into three types:
Binary Logistic Regression: There are just two possible outcomes in this case. For instance, to forecast whether it will rain (1) or not (0)
Multinomial Logistic Regression: The output consists of three or more unordered categories in this case. Prediction on regional languages, for example (Kannada, Telugu, Marathi, etc.)
Ordinal Logistic Regression is a type of regression that uses ordinal numbers to predict outcomes. The output of ordinal logistic regression consists of three or more ordered categories. For example, you may rate an Android app from 1 to 5 stars.
What is the relationship between the correlation coefficient and the coefficient of determination in a univariate linear least squares regression?
In a univariate linear least squares regression, the correlation coefficient and coefficient of determination are related by the latter being the square of the former.
R squared teaches us about the coefficient of determination and the level of variability of the dependent variable via the independent variable.
State the assumptions of Linear Regression
Following are the assumptions for linear reagression:
- There is a linear relationship between the dependent variables and the regressors, indicating that the model you are developing fits the data.
- The data errors or residuals are regularly distributed and independent of one another.
- There is no multicollinearity between explanatory variables
- There is no homoscedasticity. This signifies that the variance around the regression line is the same for all predictor variable values.
What is regression analysis?
It is a type of predictive modeling technique that examines the relationship between a dependent (goal) variable and an independent variable (s) (predictor). This method is used for forecasting, time series modeling, and determining the causal effect link between variables. The association between rash driving and the number of road accidents caused by a driver, for example, is best explored using regression.
It is a useful tool for data modeling and analysis. In this step, we fit a curve or line to the data points so that the disparities in the distances of the data points from the curve or line are minimized.
Why do we need regression analysis?
The link between two or more variables is estimated using regression analysis. Let me illustrate this using an example:
Assume you wish to forecast a company’s sales growth based on present economic conditions. You have recent firm statistics indicating that sales growth is roughly two and a half times that of the economy. Using this knowledge, we can forecast the company’s future sales based on current and historical data.
There are numerous advantages to employing regression analysis:
- It denotes the existence of substantial correlations between the dependent variable and the independent variable.
- It denotes the magnitude of the influence of numerous independent variables on a dependent variable.
Regression analysis also allows us to examine the effects of variables assessed on different scales, such as the effect of price adjustments and the quantity of promotional activities. These advantages aid market researchers, data analysts, and data scientists in identifying and evaluating the optimal set of variables to utilize when developing predictive models.
When should classification be preferred over regression?
Classification generates discrete values and categorizes datasets into tight groups, whereas regression generates continuous results that allow you to discern differences between individual points more effectively. If you intended your results to reflect the belongingness of data points in your dataset to specific explicit categories, you would prefer classification over regression
Example If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.
Additional Reading: Least Square Fit (LS)