Machine Learning: Question Set – 01
What is Machine Learning?
When we have a lot of data, it can be difficult to decide which one is relevant and which is not. We need to have a way of telling them apart. This is where Machine Learning comes in. Classification, regression, clustering etc. are the fundamental tasks in machine learning
Machine Learning has the ability to learn from data and make decisions based on that. With the help of Machine Learning, we are able to train our models in order for them to know what is relevant and what is not so that they can take care of this task for us.
The learning process starts with observations or data, such as examples, direct experience, or instruction, so that we can seek for patterns in data and make better decisions in the future based on the examples we provide. The fundamental goal is for computers to learn on their own, without the need for human involvement, and to change their behavior accordingly.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
– Tom M. Mitchell
What is Supervised Machine Learning?
Supervised learning algorithms are one of the most popular types of machine learning. They are programmed to detect specific patterns in a set of data and make predictions based on them.
You train your model on a labelled dataset in supervised learning, which means we have both raw input data and results. We divided our data into two sets: a training dataset and a test dataset. The training dataset is used to train our network, while the test dataset is used to predict results or check the model’s correctness.
As a result, in supervised learning, our model learns from observed results in the same way as a teacher does because the teacher already knows the outcomes. Because model perfection is usually high, we attain accuracy in supervised learning.
As shown below, the samples of few objects (apple, tomato, horses) with label are given as an input. Feature exptraction technic finds out the compact representation of image. These features, along with known lables are used to train the model. Model will establish mathematical function which represents the relation between features and corresponding lables. This leaned model is later used to predict the output of unknown samples.
The test image is presented to same features extraction method, which was used during training phase and the feature vector is presented to learned model. From past experience, model will predict the output of the test image, which is apple in this particular case.
Supervised learning algorithms use labeled data to get trained. The algorithm starts with a set of examples where the desired output is already known, and then builds a model based on it. The algorithm generalizes from these examples to predict new values or classify new examples as well as possible.
Some of the popular supervised algorithms are stated here:
- Linear regression
- Polynomial regression
- Logistic regression
- Support vector machine
- Decision tree
- Random forest classifier
- Artificial neural network
Discuss few applications of supervised learning method.
Spam Filtration: Detecting spam emails is a very useful tool; these filtration techniques can easily detect any type of virus, malware, or even malicious URLs. According to recent studies, approximately 56.87 percent of all emails circulating on the internet were spam in March 2017, a significant decrease from the 71.1 percent spam share in April 2014.
Online fraud detection: Machine learning is making our online transactions safer and more secure by detecting fraudulent transactions. When we conduct an online transaction, there are several ways for a fraudulent transaction to occur, such as creating fake accounts and ids, and stealing money in the middle of a transaction. So, to detect this, the Feed Forward Neural Network assists us by determining whether the transaction is genuine or fraudulent.
Sentiment analysis: Sentiment Analysis is a natural language processing technique that analyses and categorizes some meaning from text data. For instance, if we are analyzing people’s tweets and want to predict whether a tweet is a question, complaint, suggestion, opinion, or news, we will simply use sentiment analysis.
Recommender Systems: Every e-Commerce site or media uses a recommendation system to recommend products and new releases to their customers or users based on their activities. Netflix, Amazon, YouTube, and Flipkart all make a lot of money thanks to their recommendation systems.
Speech Recognition : This is a type of application in which you teach the algorithm about your voice so that it can recognize you. The most well-known real-world applications are virtual assistants like Google Assistant and Siri, which respond to the keyword only with your voice.
Self driving cars: Self-driving cars are one of the most exciting applications of machine learning. Machine learning is important in self-driving cars. Tesla, the most well-known car manufacturer, is developing a self driving car. It trains the car models to detect people and objects while driving using an unsupervised learning method.
Automatic Language Translation: Nowadays, if we visit a new place and are unfamiliar with the language, it is not a problem at all; machine learning also assists us in this regard by converting the text into our known languages. This feature is provided by Google’s GNMT (Google Neural Machine Translation), which is a Neural Machine Learning that translates the text into our familiar language and is known as automatic translation.
The technology underlying automatic translation is a sequence to sequence learning algorithm, which is used in conjunction with image recognition to translate text from one language to another.
Bioinformatics: Because we all use it in our daily lives, this is one of the most well-known applications of Supervised Learning. Bioinformatics is the storage of biological information about humans, such as fingerprints, iris texture, and earlobe size. Cellphones of today are capable of learning our biological information and then authenticating us, increasing the system’s security. Smartphones such as iPhones and Google Pixels support facial recognition, while OnePlus and Samsung support in-display finger recognition.
What is Unsupervised Machine Learning?
Unsupervised learning algorithms use unlabeled data to train themselves in order to understand the patterns within a dataset. They use the raw data to find patterns and relationships without any guidance from human engineers.
There are many different types of unsupervised learning algorithms, but they have one thing in common: they all work with unlabeled data without any human supervision. This makes it possible for them to analyze a dataset for information that is not explicitly given.
Some unsupervised learning algorithms include clustering, association rule mining, and stream mining. These three types of algorithms help users find patterns between data points in datasets that may not be explicit or obvious at first glance.
Define the term: Classification
Classification is a type of supervised learning that is used in order to predict the class labels for the data. It can be thought of as a subset of supervised learning, but with a focus on predictions.
In classification, we create a Machine Learning model that assists us in differentiating data. The model helps us explore different variables and their relation to the response variable.
Creating the model involves splitting up the data into two groups (training set and testing set) and providing labels for each group so that the machine knows what’s important or not important so when does it make decisions
Define the term: Regression
A regression is defined as a process of creating a model for distinguishing data into continuous real values. The most common problem solved with the regression is how to predict how much something will cost given its attributes.
It is a technique used to analyze the relation between two or more variables in order to assess potential correlation
In regression analysis, we typically have one dependent and one independent variable. The dependent variable is the one we are trying to predict with the independent variable.
For example, if we want to see how someone’s height can be predicted from their weight, then height would be the dependent and weight would be the independent variable.
What is Cross Validation?
Cross-validation is done by splitting your data into three groups: training set, validation set, and testing set. The training data is used to train the model you want to create while the validation data is used for testing it against. The testing data is not used for anything since it will not have any information about what it needs to be able to do.
The method of cross-validation is a process that allows people to predict the accuracy of a machine learning algorithm more accurately.
Cross-validation can help to identify the best possible values for parameters in an algorithm. It is a numerical way of estimating how good our model will be performing on new data.
Cross-validation does not need any additional data from outside of our dataset, since it is based on the data we have already observed. This allows us to create more accurate predictions as we are using all of our available information to make these predictions.
How Bias affects the performance of machine learning model ?
Bias is the difference between average prediction of model and correct value. Ideally, it should be as low as possible. Classification or regression model with lower bias is considered as having good generalization.
Bias has a huge effect on predictive performance, but it doesn’t always have to be negative. If you are using a classification algorithm such as logistic regression, and want to classify people into two groups, then bias will be the tendency for observations with lower probabilities to be classified in the class with higher probability.
How variance affects the performance of machine learning model?
The prediction is the anticipated value of the training set.
Variance is difference of prediction over a training set and anticipated value of other training set.
The variance of a prediction is an indication to how uncertain it is and might lead to bias in the forecast. Classification and regression could have very adverse effect if model has very high variance. It states that model has overfitted.
Additional Reading: Cross validation