Machine Learning: Question Set – 15
What are Recommender System and How Do They Work?
A Recommender System is a program that predicts a user’s preferences and suggests things that are likely to be of interest to them.
Data for recommender systems comes from explicit user evaluations after seeing a movie or listening to a music, implicit search engine queries and purchase histories, and other information about the users/items themselves.
Following figures describes the collaborative filtering based recommender system and content based filtering based recommender system.
What do you mean by bias in machine learning?
Inconsistency in data is signaled by data bias. There are various reasons for the contradiction, none of which are mutually incompatible.
For example, to speed up the hiring process, a tech behemoth like Amazon built a single engine that will take 100 resumes and spit out the top five candidates, who will then be hired.
The software was changed once the corporation found it wasn’t delivering gender-neutral results.
How Should You Deal With Overfitting and Underfitting?
Overfitting occurs when a model is too well suited to training data; in this scenario, we must resample the data and evaluate model accuracy using approaches such as k-fold cross-validation.
Whereas in the event of Underfitting, we are unable to understand or capture patterns from the data, we must either adjust the algorithms or input more data points to the model.
How Should Outlier Values Be Handled?
An outlier is a dataset observation that is significantly different from the rest of the dataset.
- Box plot
- Scatter plot
To deal with outliers, we usually need to use one of three easy strategies:
- We can drop them.
- We can classify them as outliers and include them as a feature; we can also alter the feature to decrease the outlier’s impact.
Which do you value more: model correctness or model performance?
There are models with higher accuracy that perform poorly in terms of predictive power.
It all comes down to the fact that model correctness is merely a subset of model performance, and a potentially misleading one at that. For example, if you wished to detect fraud in a big dataset with a sample size of millions, a more accurate model would almost certainly predict no fraud at all if only a small percentage of cases were fraudulent. This would, however, be useless for a predictive model—a program designed to detect fraud that claimed there was no fraud at all!
What is the F1 score? What would you do with it?
The F1 score is a model’s performance metric. It is a weighted average of a model’s precision and recall, with results closer to 1 being the greatest and those closer to 0 being the worst.
It would be used in classification test where True negatives aren’t as important.
What are the many types of Machine Learning/Training models?
The presence or absence of target variables is the primary way that ML algorithms can be characterized.
A. Supervised learning consists of the following steps: [There is a target in the room]
Using labeled data, the system learns. Before beginning to make choices with fresh data, the model is trained on an existing data set. Linear Regression, polynomial Regression, and quadratic Regression are all examples of continuous variables.
Logistic regression, Naïve Bayes, KNN, SVM, Decision Tree, Gradient Boosting, ADA boosting, Bagging, Random forest, and other categorical regression methods are used.
B. Unsupervised learning: [Target is unavailable]
The computer is trained on unlabeled data with no supervision. By forming clusters, it automatically infers patterns and relationships in the data. The model learns from observations and deduced data structures.
Singular Value Decomposition, Principal Component Analysis, and so on.
C. Reinforcement Learning:
The model learns through trial and error. This type of learning involves an agent interacting with the environment to generate actions and then discovering errors or rewards from those actions.
What is the Dimensionality Curse, and how may it be overcome?
This occurs when your dataset has an excessive number of features, making it difficult for your model to learn and extract those characteristics.
- More features than observations, increasing the risk of overfitting the model
- Too many features, making it difficult to cluster observations Too many dimensions lead every observation in the dataset to look equidistant from all others, making it impossible to create meaningful clusters.
Principal Component Analysis is the primary technique for resolving this issue (PCA).
PCA is an unsupervised machine learning approach that tries to minimize the dimensionality (number of features) of a dataset while maintaining as much information as feasible. This is accomplished by identifying a new set of features known as components, which are composites of the original uncorrelated features. They are also limited in such a way that the first component accounts for the most variability in the data, the second for the second most variability, and so on.
What exactly is the ROC curve and what does it represent?
The Receiver Operating Characteristic curve (or ROC curve) is a key tool for diagnostic test evaluation and is a plot of the true positive rate (Sensitivity) vs the false positive rate (Specificity) for the various diagnostic test cut-off points.
- It demonstrates the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity).
- The more accurately the test, the closer the curve follows the left-hand border and then the top border of the ROC space.
- The closer the curve gets to the ROC space’s 45-degree diagonal, the less accurate the test.
- The likelihood ratio (LR) for a given test value is given by the slope of the tangent line at that cut point.
- The area under the curve is a test accuracy metric.
Describe the distinction between L1 and L2 regularization.
L2 regularization spreads error across all terms, whereas L1 is more binary/sparse, with numerous variables allocated a 1 or 0 in weighting. Setting a Laplacian prior on the terms corresponds to L1, whereas setting a Gaussian prior relates to L2.
Additional Reading: What is meant by regularization?