# Machine Learning: Question Set – 10

#### In Machine Learning, what is Bayes’ theorem?

Using past knowledge, the Bayes’ theorem calculates the likelihood of any given event occurring. It is defined mathematically as the true positive rate of the provided sample condition divided by the sum of the said condition’s true positive rate and the population’s false positive rate.

Bayesian optimization and Bayesian belief networks are two of the most important applications of the Bayes’ theorem in Machine Learning. This theorem also serves as the cornerstone for the Machine Learning brand, which employs the Naïve Bayes classifier.

P(A | B) = (P(B | A) * P(A)) / (P(B))

Where P(B|A) is the probability of B occurring given proof that A has previously occurred.

#### How would you deal with an unbalanced dataset?

An imbalanced dataset occurs when, for example, a classification test has 90 percent of the data in one class. This causes issues: a 90% accuracy can be distorted if you have little predictive capacity on the other category of data! Here are a few strategies for getting over the hump:

- Collect more data to balance the dataset’s imbalances.
- Resample the dataset to eliminate imbalances.
- On your dataset, try a different approach entirely.

What matters here is that you have a deep understanding of the harm that an unbalanced dataset can do, as well as how to balance it.

#### Give an example of how ensemble approaches could be useful.

Ensemble approaches optimise predictive performance by combining learning algorithms. They often reduce overfitting in models and improve model robustness (unlikely to be influenced by small changes in the training data).

You may give some instances of ensemble methods, ranging from bagging to boosting to a “bucket of models” method, and show how they can improve predictive power.

#### What is the distinction between a Bayesian estimate and a Maximum Likelihood estimate?

We have some understanding of the data/problem while performing a bayesian estimate (prior). There may be various values of the parameters that explain data, thus we can look for multiple parameters that do this, such as 5 gammas and 5 lambdas. We receive numerous models for producing multiple predictions as a result of Bayesian Estimate, one for each pair of parameters but with the same prior. So, if a new example must be predicted, obtaining the weighted total of these forecasts is sufficient.

Maximum likelihood does not take prior into account, hence it is equivalent to being a Bayesian while utilizing some form of flat prior.

#### What exactly is Bayes’ Theorem? What role does it play in machine learning?

The Bayes’ Theorem calculates the posterior probability of an event given prior knowledge.

It is represented mathematically as the true positive rate of a condition sample divided by the sum of the population’s false positive rate and the true positive rate of a condition.

Assume you had a 60% likelihood of having the flu following a flu test, but the test will be false 50% of the time among those who had the flu, and the whole population had a 5% chance of having the flu. Would you really have a 60% risk of getting the flu if you had a positive test?

No, according to Bayes’ Theorem. It states you have a (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05) (True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% probability of catching the flu.

The Naive Bayes classifier is based on Bayes’ Theorem, which is the foundation of a discipline of machine learning.

#### What makes “Naïve” Bayes so naïve?

Despite its practical applications, particularly in text mining, Naive Bayes is called “Naive” because it makes an assumption that is nearly impossible to observe in real-world data: the conditional probability is calculated as the pure product of the individual probabilities of components.

This implies complete feature independence – a criterion that is unlikely to be realized in actual life.

#### How should outlier values be handled?

Outlier is the data point which is quite different from most of the other data points in the given set.

Univariate or any other graphical analysis method can be used to identify outlier values. If the number of outlier values is small, they can be evaluated separately; however, if the number of outliers is considerable, the values can be substituted with either the 99th or 1st percentile values.

Outlier values are not all extreme values. The most popular methods for dealing with outlier values are:

- changing the value and bringing it into a range
- Simply removing the value.

#### What exactly is regularization? How does it help?

Regularizations are methods for decreasing error by fitting a function to a training set in an appropriate way to minimize overfitting.

There is a strong likelihood that the model will learn noise or data-points that do not represent any property of your genuine data when training. This can result in overfitting.

As a result, we apply regularization in our machine learning models to reduce this type of inaccuracy.

#### Describe the distinction between L1 and L2 regularization.

L2 regularization spreads error across all terms, whereas L1 is more binary/sparse, with more variables allocated 1 or 0 in weighting.

Setting a Laplacian prior on the terms corresponds to L1, whereas setting a Gaussian prior relates to L2.

**Additional Reading**: Ensemble Learning [Wiki]