We are aware of many ML-algorithms into sections of Supervised learning approach. However, it is really difficult to understand which ML model to apply. How do you know which algorithm for your machine to choose? Why not try all the algorithms or some of the algorithms we consider to be of good precise use. But, it takes a lot of time if we apply every algorithm. Therefore, we need some techniques that can be helpful to select the best alogorithm.
The process of selecting the right algorithm is related to the problem statement. This technique can save time as well as money. It is therefore important to know what kind of problem we face.
In this blog, we will learn the main techniques to select the right machine algorithm in a particular job. Through this article, we discuss how to choose the machine model for learning by tracing the properties of the dataset. We will also discuss how a machine’s learning algorithm can be used to measure the size of the dataset.
We will consider a dataset from kaggle i.e success of bank predicition in telemarketing.
First of all, we will import the required libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
After it we will proceed by reading the csv file.
df = pd.read_csv("bank_dataset.csv")
Pair Plot Method
With the technique of pair plot, we will understand which ML model to apply.
A decision tree or Random Forest works on the principle of non-linear classification. We can use these ML algorithm if some of the data points are overlapping with each other. In above example as we can see lot of overlap between points, to use ensemble classifer is recommended.
A large number of algorithms assume that classes can be divided by a straight line. Logistic or vector support machine should be preferred in such cases. Drawing a line that divides the target class easier separates the data points. Linear algorithms for regression assume that data trends are straight. For this case, these algorithms are good.
Other factor that needs to considered it the size of the dataset. Depending on the training time taken for different alogrithm it varies. Aditionally, have a check on size of dataset and training time required for the same.
Enjoy Selecting Rightly!!