# Machine Learning: Question Set – 14

#### How do you prune a decision tree?

Pruning is the process by which branches with poor predictive power are eliminated from decision trees in order to minimize the model’s complexity and raise the predicted accuracy of a decision tree model.

Pruning can be done from the bottom up or from the top down, using techniques such as reduced error pruning and cost complexity pruning.

The simplest version is probably reduced error pruning: replace each node. Keep it pruned if it does not reduce predicted accuracy. Despite its simplicity, this heuristic is very close to a method that would optimize for maximum accuracy.

#### What is Cross-Validation, and how does it work?

Cross Validation is a technique for dividing your data into three sections: training, testing, and validation. The data is divided into k subsets, and the model has been trained on k-1 of them.

The final selection will be used for testing. This is repeated for each subgroup. This is referred to as k-fold cross-validation. Finally, the ultimate score is calculated by averaging the scores from all of the k-folds.

#### What is the difference between precision and recall?

Precision and recall are two metrics that can be used to assess the effectiveness of machine learning deployment. However, they are frequently employed at the same time.

Precision solves the question, “How many of the things projected to be relevant by the classifier are genuinely relevant?”

Recall, on the other hand, responds to the query, “How many of all the actually relevant objects are found by the classifier?”

Precision, in general, refers to the ability to be precise and accurate. As a result, our machine learning model will follow suit. If your model must predict a set of items in order to be useful. How many of the items are genuinely important?

#### How do you know the Machine Learning Algorithm you should use?

It is entirely dependent on the data we have. SVM is used when the data is discrete. We utilise linear regression if the dataset is continuous.

As a result, there is no one-size-fits-all method for determining which machine learning algorithm to utilise; it all depends on the exploratory data analysis (EDA).

EDA is similar to “interviewing” a dataset; as part of our interview, we ask the following questions:

• Sort our variables into categories like continuous, categorical, and so on.
• Using descriptive statistics, summaries our variables.
• Use charts to visualize our variables.

Choose one best-fit algorithm for a dataset based on the above observations.

#### When working on a data set, how do you choose important variables?

There are several methods for selecting key variables from a data set, including the following:

• Before settling on crucial factors, identify and eliminate correlated variables.
• The variables could be chosen depending on the p-values obtained from Linear Regression.
• Lasso Regression
• Forward, Backward, and Stepwise Selection
• Plot variable chart and Random Forest
• Top features can be selected based on information gain for the available set of features

#### What precisely is a Decision Tree Classification?

A decision tree constructs classification (or regression) models in the form of a tree structure, with datasets broken down into ever-smaller subsets as the decision tree develops, literally in the form of a tree with branches and nodes. Both category and numerical data can be handled by decision trees.

#### What precisely is a Recommendation System?

A recommendation system will be familiar to anyone who has used Spotify or shopped on Amazon: It is an information filtering system that predicts what a user may wish to hear or see based on the user’s chosen patterns.

#### Is it better to have a high number of false positives or a high number of false negatives? Explain.

It is determined by the query as well as the domain in which we are attempting to answer the problem. While applying Machine Learning in the sector of medical testing, a false negative is quite dangerous because the report will not reflect any health condition when a person is genuinely ill. Similarly, if Machine Learning is used to detect spam, a false positive is extremely dangerous because the algorithm may categorize a vital email as spam.

#### Differentiate: Machine Learning and Deep Learning Additional Reading: Recommendation systems – Principles, methods and evaluation