# Machine Learning: Question Set – 02

#### Explain the concept of Linear Regression.

Linear regression is a common technique used in statistics. Linear regression is used to find a linear relationship between two variables, one of which (the dependent variable) varies with the other (the independent variable).

The dependent variable can be a continuous or discrete variable that changes values continuously, for example height or weight. It can also be a categorical variable with two or more levels, for example blood type.

The independent variable can be any quantity that we are interested in guessing from the data we have collected on the dependent variable. For example, in an analysis of heights and weights, we would use height as the independent and weight as the dependent variables. In an analysis of blood types and eye colors, eye color would be the independent and blood type would be the dependent variables.

#### Explain the concept of Logistic Regression

Logistic regression is a form of regression analysis where the dependent variable can take on values that are categorical. Logistic regression is mainly used when the dependent variable cannot be described by a linear equation.

Logistic regression is not like other types of probit models because it will always have non-zero coefficients. The logit function has zero slope so these coefficients are always positive and this makes them interpretable in terms of probability.

The downside to logistic regression is that it often has high multicollinearity which means that the covariates (independent variables) are correlated to each other and this can have a negative impact on model’s accuracy.

#### Explain confusion matrix with its application.

Confusion matrix is a simple two-dimensional table that summarizes the performance of a model. It is used to show how accurate the model’s predictions are and to compare them with other models.

A confusion matrix can be used to find out the accuracy of a predictive model by measuring how many observations were correctly classified by the model and how many were classified incorrectly.

The confusion matrix also gives an overview of how different models have performed in their predictions, which will give you insights in which one you should choose for your application.

#### What do you understand by Type – I (False Positive) error?

False positives are quite common in medicine and healthcare. The more sensitive a test is, the higher the chances of false positive.

A false positive error can be caused by various reasons. Firstly, there might be some conditions where it is difficult to distinguish between true negatives and false positives. Secondly, there might be multiple tests that need to be done to confirm a condition and the tests may sometimes have errors. This can lead to a false positive result.

#### What do yo understand by Type – II (False Negative) error?

False negatives are an important issue that has been mostly absent from the public eye.

False negatives can also be referred to as a type of medical error that happens when the outcome of a test shows the acceptance of a false condition.

The higher the number of false negative cases, the more likely it is for an individual to suffer from long-term health issues. The most common causes for false negative are errors in lab tests, misreadings, and misinterpretations.

#### What is Bayes’ Theorem

The Bayes’ theorem is a mathematical equation that calculates the probability of what event has occurred. It does this by using prior knowledge, which is what we know about the world.

This equation can be used in many different situations. For example, it could be used to calculate how likely it is for a patient to have cancer after an examination. And also to determine if there is a fire in a building based on the amount of smoke and heat coming out of the building.

#### Define the term: Recall

This is a measure of how many of the examples we classify as true actually are true.

Positive: Recall measures Of all the actual true samples how many did we classify as true (the number that are classified as true)

Negative: True negative (the number that are not classified as true and are also not false positives)

Recall = TP / (TP + FN) = 1 / 1 = 1.0

#### Define the term: Precision

In a way, precision is a measure of how sure you can be about your classification before you start claiming it is accurate. When you have high precision your classification may not be perfect but it’s close enough that it will usually work. When you have low precision, your classification may not be perfect and there’s no guarantee that it will work at all.

Precision is a measure of how many of the samples we classify as true are actually true, in relation to all the samples we classified as true, not just those that are in fact correct.

Precision = TP / (TP + FP) = 1 / 3 = 0.33