## Machine Learning: Question Set – 11

#### Describe the SVM algorithm in detail.

A Support Vector Machine (SVM) is a supervised machine learning model that can do linear and non-linear classification, regression, and even outlier detection.

Assume we’ve been given certain data points, each of which belongs to one of two classes, and our goal is to distinguish between the two groups using a collection of examples.

A data point in SVM is represented as a p-dimensional vector (a list of p numbers), and we wanted to see if we could separate them using a (p – 1) – dimensional hyperplane. A linear classifier is what this is termed.

The data is classified using a variety of hyperplanes. To select the best hyperplane that indicates the greatest distance between the two classes.
If such a hyperplane exists, it is referred to as a maximum-margin hyperplane, and the linear classifier that it creates is referred to as a maximum margin classifier.

We have data (x1, y1),…, (xn, yn), and several features (xii,…, xip), with yi being 1 or -1.

The set of points satisfying: w. x-b = 0 is the hyperplane equation.

Where w is the hyperplane’s normal vector. The offset of the hyperplane from the original along the normal vector w is determined by the parameter b||w||.

So for each i, either xi is in the hyperplane of 1 or -1. Basically, xi satisfies:

w . xi – b  > 1  or   w. xi – b < -1

#### What is Supervised Learning and How Does It Work?

Supervised learning is a machine learning algorithm that uses labelled training data to infer a function. A series of training examples makes up the training data.

As an example, Knowing a person’s height and weight can help determine their gender. The most popular supervised learning algorithms are shown below.

• Logistic regression
• Decision Tree
• Neural Network
• Support vector machine
• Random forest classifier
• K-nearest neighbor

#### In SVM, what are Support Vectors?

A Support Vector Machine (SVM) is an algorithm that tries to fit a line (or plane or hyperplane) between the distinct classes that maximizes the distance between the line and the classes’ points.

It tries to find a strong separation between the classes in this way. The Support Vectors are the points on the dividing hyperplane’s edge, as seen in the diagram below.

#### What are the various SVM kernels?

• Linear kernel: When data is linearly separable, a linear kernel is utilized.
• Polynomial kernel: When you have discrete data with no natural idea of smoothness, you can use a polynomial kernel.
• Radial basis kernel: Create a decision boundary that is substantially more effective than the linear kernel in separating two classes.
• Sigmoid kernel: a type of neural network activation function.

#### What is the difference between covariance and correlation?

Covariance quantifies how two variables are related to one another and how one might change in response to changes in the other. If the value is positive, it indicates that there is a direct relationship between the variables and that one would rise or decrease with an increase or decrease in the base variable, assuming all other conditions remain constant.

Correlation measures the relationship between two random variables and has only three distinct values: 1, 0, and -1.

A positive relationship is denoted by a 1; a negative relationship is denoted by a -1; and the two variables are independent of each other by a 0.

#### What causes overfitting?

Overfitting happens when the criteria used to train the model differ from the criteria used to measure a model’s efficiency.

#### What distinguishes KNN from k-means?

KNN, or K nearest neighbors, is a supervised method used for classification. A test sample in KNN is defined as the class of the majority of its nearest neighbors. K-means, on the other hand, is an unsupervised technique that is primarily used for clustering.

Only a set of unlabeled points and a threshold are required for k-means clustering. The algorithm then learns how to cluster unlabeled data into groups by computing the mean distance between various unlabeled points.

#### Give an example of how ensemble approaches could be useful.

Ensemble approaches optimize predictive performance by combining learning algorithms. They often reduce overfitting in models and improve model robustness (unlikely to be influenced by small changes in the training data).

#### What exactly do you mean by selection bias?

• A bias in the sampling portion of an experiment is caused by a statistical error.
• The inaccuracy leads one sampling group to be chosen more frequently than the other groups in the experiment.
• If the selection bias is not identified, it may result in an incorrect conclusion.