Machine Learning: Question Set – 08
What is decision tree (DT?
In real life, a tree has a lot of analogies, and it turns out that it has inspired a lot of machine learning, including classification and regression. A decision tree can be used to visually and explicitly describe decisions and decision-making in decision analysis. It employs a decision-tree-like model, as the name implies.
The most powerful and widely used tool for categorization and prediction is the decision tree. A decision tree is a flowchart-like tree structure in which each internal node represents an attribute test, each branch reflects the test’s conclusion, and each leaf node (terminal node) stores a class label.
State the advantages of Decision Tree
- It’s easy to comprehend, interpret, and envision.
- DT does not necessitate data normalization.
- DT does not necessitate data scalability.
- Users can prepare data for decision trees with comparatively minimal effort.
- Decision trees need less effort for data preparation during pre-processing than other methods.
- Variable screening or feature selection is performed implicitly by decision trees.
- it has the ability to work with both numerical and categorical data. Can also deal with multi-output issues.
- In addition, missing values in the data have no significant impact on the decision tree-building process.
- The performance of the tree is unaffected by nonlinear interactions between parameters.
State disadvantages of Decision Tree
- Provide less information on the predictor-response relationship.
- Predictors with more volatility or levels are favored
- Highly collinear predictors can cause problems.
- For responses with small sample sizes, prediction accuracy may be poor.
- They’re unstable, which means that a slight change in the data can result in a significant change in the structure of the best decision tree.
- They are frequently insufficiently accurate. With same data, several alternative predictors do better. A random forest of decision trees can be used to replace a single decision tree, however a random forest is not as straightforward to comprehend as a single decision tree.
- Information gain in decision trees is biased in favor of qualities with more levels when data includes categorical variables with differing number of levels.
- When compared to other algorithms, a decision tree’s calculation might get quite complex at times.
- Calculations can become quite complicated, especially when multiple values are uncertain and/or multiple outcomes are related.
- When it comes to using regression and predicting continuous values, the DT falls short.
How do you evaluate a good logistic model?
There are several ways to evaluate the findings of a logistic regression study.
- The first is to use a Classification Matrix to look at true negatives and false positives.
- Concordance that aids in determining the logistic model’s ability to distinguish between the event occurring and not occurring.
- Lift aids in the evaluation of the logistic model by comparing it to random selection.
How should outlier values be handled?
Univariate or any other graphical analysis method can be used to identify outlier values. If the number of outlier values is small, they can be evaluated separately; however, if the number of outliers is considerable, the values can be substituted with either the 99th or 1st percentile values.
Outlier values are not all extreme values.
The most popular methods for dealing with outlier values are:
- changing the value and bringing it into a range
- Simply removing the value.
What exactly are Eigenvalues and Eigenvectors?
Eigenvectors are used to comprehend linear transformations. In data analysis, the eigenvectors of a correlation or covariance matrix are typically computed. Eigenvectors are the directions along which a linear transformation flips, compresses, or stretches.
Eigenvalue is the strength of the transformation in the direction of the eigenvector or the component that causes compression.
What is the F1 score? How would you put it to use?
The F1 score is a measurement of a model’s performance. It is a weighted average of a model’s precision and recall, with results closer to 1 being the greatest and those closer to 0 being the worst. It would be used in categorization exams where genuine negatives aren’t as important.
F1 Score = 2 x (Precision x Recall) / (Precision + Recall)
Additional Reading: Know more about F1 Score.