# Machine Learning: Question Set – 06

#### What is precision and recall? How they are used in ROC curve?

The percentage of true positives labeled as positive by the model is referred to as recall. Precision expresses the percentage of correct positive predictions. The ROC curve depicts the relationship between model recall and specificity, where specificity is defined as the percentage of true negatives classified as negative by the model.

Recall, accuracy, and the ROC are all measurements used to determine how useful a given categorization model is.

#### What is the difference between kNN and K-Means algorithms?

The kNN, or k-nearest neighbors, algorithm is a classification process in which k is an integer representing the number of surrounding data points that influence the categorization of a given observation. K-means is a clustering algorithm in which k is an integer representing the number of clusters to be produced from the given data. Both perform distinct functions.

#### Differentiate Linear and Logistic regression models

The most basic types of regression that are often employed are linear and logistic regression. The primary distinction between these two is that logistic regression is utilized when the dependent variable is binary. Linear regression, on the other hand, is utilized when the dependent variable is continuous and the type of the regression line is linear.

Parameter | Linear Regression | Logistic Regression |
---|---|---|

Fundamental task | Regression | Classification |

Basic | The data is modelled using a straight line | The probability of some obtained event is represented as a linear function of a combination of predictor variables |

Domain of predicted variable | Continuous | Discrete |

Linear relationship between DV and IV | Required | Not required |

The independent variable | Could be correlated with each other. (Especially in multiple linear regression) | Should not be correlated with each other (no multicollinearity exist). |

Collinearity | There may be collinearity between the independent variables. | In logistic regression, there should not be collinearity between the independent variable. |

Accuracy mesaure | Least square method | Maximum Likelihood estimation |

Problem example: | House price prediction Student performance prediction | Tumor prediction (present or absent) Spam email classification (Spam or not spam) |

- DV: Dependent variable
- IV: Independent variable

#### How categorical data are managed in classification problems?

Categorical features can only have a limited number of possible values, which are usually fixed. For example, if a dataset has information about users, you will normally find attributes such as nation, gender, age group, and so on. If the data you’re dealing with is related to products, you’ll see elements like product kind, manufacturer, seller, and so on.

In your dataset, these are all categorical features. These characteristics are often recorded as text values that indicate various characteristics of the observations. For example, gender can be described as Male (M) or Female (F), and product type can be described as electronics, clothing, food, and so on.

It also offers a number of additional useful datasets to assist identify what causes delays:

- weather: hourly meteorological data for each airport
- planes: information on each plane’s constructor
- airlines: conversion of two letter carrier codes and names
- airports: names and locations of airports

The following are the approaches you will use to manage categorical data:

- Replacing values
- Encoding labels
- One-Hot encoding
- Binary encoding
- Backward difference encoding
- Miscellaneous features

#### Discuss the importance of feature engineering.

The practice of applying domain knowledge of the data to develop features that make machine learning algorithms work is known as feature engineering. When feature engineering is done correctly, it improves the predictive capacity of machine learning algorithms by generating features from raw data to aid in the machine learning process. Feature Engineering is a form of art.

The following are the steps involved in solving any problem in ML:

- Data collection
- Data cleaning
- Feature engineering
- Defining the model
- Model training, testing, and output prediction

The most significant art in machine learning is feature engineering, which makes a huge difference between a good model and a terrible model.

#### What exactly machine learning means? How it is useful to data scientists?

Machine learning is an artificial intelligence (AI) technology that allows computers to automatically learn and improve from experience without being explicitly designed. Machine learning is concerned with the creation of computer programs that can access data and utilize it to learn on their own.

The learning process begins with observations or data, such as examples, direct experience, or teaching, in order to seek for patterns in data and make better decisions in the future based on the examples provided. The basic goal is for computers to learn autonomously without human involvement or aid and then adapt their activities accordingly.

Machine Learning is a blooming topic that is utilized in web searches, ad placement, credit scoring, stock trading, and a variety of other applications. This data science course provides an overview of machine learning and algorithms. We’ll also look at why algorithms are so important in Big Data analysis.

Machine learning and data science can coexist. Consider the notion of machine learning: a machine’s ability to generalize knowledge from data. Machines can learn relatively little in the absence of data. If anything, the increased use of machine learning in many industries will work as a catalyst, pushing data science to become more relevant.

Machine learning is only as good as the data provided to it and the algorithms’ capacity to absorb it. Basic degrees of machine learning will become a standard prerequisite for data scientists in the future.

**Additional Reading:** ROC Curve. Click to read