# Machine Learning: Question Set – 13

#### What is the difference between a Loss Function and a Cost Function? What is the main distinction between them?

A **loss function **is typically a function that assesses the penalty and is defined on a data point, prediction, and label.

Typically, the **cost function **is more generic. It could be the sum of your training set’s loss functions plus some model complexity penalty (regularization).

The most general name for any function that you optimize during training is **objective function**. In the maximum likelihood approach, for example, the probability of creating a training set is a well-defined objective function, but it is neither a loss nor a cost function (however you could define an equivalent cost function).

When computing loss, we just consider one data point, which is referred to as a loss function.

When determining the sum of error for multiple data, the cost function is used. There isn’t much of a difference.

To put it another way, a loss function captures the difference between the actual and projected values for a single record, whereas a cost function aggregates the difference across the whole training dataset.

Mean-squared error and Hinge loss are the most widely utilized loss functions.

The Mean-Squared Error (MSE) is a measure of how well our model predicted values compared to the actual values.

MSE = √(predicted value – actual value)^{2}

Hinge loss: It is used to train the machine learning classifier, which is

L(y) = max(0,1- yy)

Where y = -1 or 1 denotes two classes and y denotes the classifier’s output form. In the equation y = mx + b, the most common cost function depicts the entire cost as the sum of the fixed and variable costs.

#### What does Naïve mean in a Naïve Bayes model?

The Naïve Bayes technique is a supervised learning algorithm that is naïve since it assumes that all attributes are independent of one another by applying Bayes’ theorem.

#### What does an F1 score entail? What would you do with it?

Consider following confusion matrix:

Prediction | Predicted Yes | Predicted No |

Actual Yes | True Positive (TP) | False Negative (FN) |

Actual No | False Positive (FP) | True Negative (TN) |

In binary classification, the F1 score is used to determine the model’s accuracy. The F1 score is calculated as a weighted average of precision and recall.

F1 = 2TP / 2TP + FP + FN

Scores for F1 range from 0 to 1, with 0 being the worst and 1 being the highest.

The F1 score is commonly used in information retrieval to assess how well a model retrieves relevant results, and it is a good indicator of how well our model performs.

#### What is ensemble learning, and how does it work?

Ensemble learning is a strategy for creating more powerful machine learning models by combining numerous models.

There are numerous causes for a model’s uniqueness. The following are a few reasons:

- Various populations
- Various hypotheses
- Various modelling methodologies

We will encounter an error when working with the model’s training and testing data. Bias, variation, and irreducible error are all possible causes of this inaccuracy.

The model should now always exhibit a bias-variance trade-off, which we term a bias-variance trade-off.

This trade-off can be accomplished by ensemble learning.

There are a variety of ensemble approaches available, however there are two general strategies for aggregating several models:

- Bagging, a native method: take a training set and use it to produce new training sets.
- Boosting, a more elegant method: boosting is used to optimize the best weighting scheme for a training set, comparable to bagging.

#### What is Clustering and How Does It Work?

Clustering is the process of dividing a collection of objects into several groups. Objects in the same cluster should be similar to one another, but not to those in other clusters.

The following are some examples of clustering:

- K-means clustering
- K-Medoid clustering
- Hierarchical clustering
- Fuzzy clustering
- Density based clustering

#### What is the best way to choose K for K-means Clustering?

Direct procedures and statistical testing methods are the two types of methods available:

- Direct methods: It contains elbow and silhouette
- Statistical testing methods: It has gap statistics.

When selecting the ideal value of k, the silhouette is the most commonly utilized.

#### What are the primary distinctions between supervised and unsupervised machine learning?

To train the model, the supervised learning technique requires labeled data. To solve a classification problem (a supervised learning task), for example, you require label data to train the model and categorize the data into your labeled groups.

Unsupervised learning does not necessitate the use of a labeled dataset. This is the primary distinction between supervised and unsupervised learning.

#### State few applications of Supervised Machine Learning in Today’s Businesses?

**Detection of Email Spam**: In this step, we train the model with historical data consisting of emails classified as spam or not spam. This labeled data is fed into the model as input.**Detection of Fraud**: We can discover instances of probable fraud by training the algorithm to identify suspicious patterns.**Diagnosis in Healthcare**: A model can be taught to detect whether or not a person has an illness by supplying photos of the sickness.**Sentiment analysis**: This is the process of mining papers using algorithms to identify whether they are positive, neutral, or negative in sentiment.

**Additional Reading**: Ensemble Learning