[PER57] - Perceptron Model 1957Ā¶
Example of use of a Perceptron, with sklearn and IRIS dataset of 1936 !Objectives :Ā¶
- Implement a historical linear classifier with a historical dataset !
- The objective is to predict the type of Iris from the size of the leaves.
- Identifying its limitations
The IRIS dataset is probably one of the oldest datasets, dating back to 1936 .
What we're going to do :Ā¶
- Retrieve the dataset, via scikit learn
- training and classifying
Step 1 - Import and initĀ¶
InĀ [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import os,sys
import fidle
# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('PER57')
FIDLE - Environment initialization
Version : 2.3.0 Run id : PER57 Run dir : ./run/PER57 Datasets dir : /gpfswork/rech/mlh/uja62cb/fidle-project/datasets-fidle Start time : 03/03/24 21:03:20 Hostname : r3i6n3 (Linux) Tensorflow log level : Info + Warning + Error (=0) Update keras cache : False Update torch cache : False Save figs : ./run/PER57/figs (True) numpy : 1.24.4 sklearn : 1.3.2 yaml : 6.0.1 matplotlib : 3.8.2 pandas : 2.1.3
Step 2 - Prepare IRIS DatasetĀ¶
Retrieve a dataset : http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
About the datesets : https://scikit-learn.org/stable/datasets.html#datasets
Data fields (X) :
- 0 : sepal length in cm
- 1 : sepal width in cm
- 2 : petal length in cm
- 3 : petal width in cm
Class (y) :
- 0 : class 0=Iris-Setosa, 1=Iris-Versicolour, 2=Iris-Virginica
2.1 - Get datasetĀ¶
InĀ [2]:
x0,y0 = load_iris(return_X_y=True)
x = x0[:, (2,3)] # We only keep fields 2 and 3
y = y0.copy()
y[ y0==0 ] = 1 # 1 = Iris setosa
y[ y0>=1 ] = 0 # 0 = not iris setosa
df=pd.DataFrame.from_dict({'Length (x1)':x[:,0], 'Width (x2)':x[:,1], 'Setosa {0,1} (y)':y})
display(df)
print(f'x shape : {x.shape}')
print(f'y shape : {y.shape}')
Length (x1) | Width (x2) | Setosa {0,1} (y) | |
---|---|---|---|
0 | 1.4 | 0.2 | 1 |
1 | 1.4 | 0.2 | 1 |
2 | 1.3 | 0.2 | 1 |
3 | 1.5 | 0.2 | 1 |
4 | 1.4 | 0.2 | 1 |
... | ... | ... | ... |
145 | 5.2 | 2.3 | 0 |
146 | 5.0 | 1.9 | 0 |
147 | 5.2 | 2.0 | 0 |
148 | 5.4 | 2.3 | 0 |
149 | 5.1 | 1.8 | 0 |
150 rows Ć 3 columns
x shape : (150, 2) y shape : (150,)
2.2 - Train and test setsĀ¶
InĀ [3]:
x,y = fidle.utils.shuffle_np_dataset(x, y)
n=int(len(x)*0.8)
x_train = x[:n]
y_train = y[:n]
x_test = x[n:]
y_test = y[n:]
print(f'x_train shape : {x_train.shape}')
print(f'y_train shape : {y_train.shape}')
print(f'x_test shape : {x_test.shape}')
print(f'y_test shape : {y_test.shape}')
Datasets have been shuffled. x_train shape : (120, 2) y_train shape : (120,) x_test shape : (30, 2) y_test shape : (30,)
Step 3 - Get a perceptron, and train itĀ¶
InĀ [4]:
pct = Perceptron(max_iter=100, random_state=82, tol=0.01, verbose=1)
pct.fit(x_train, y_train)
-- Epoch 1 Norm: 1.56, NNZs: 2, Bias: 3.000000, T: 120, Avg. loss: 0.205917 Total training time: 0.00 seconds. -- Epoch 2 Norm: 1.56, NNZs: 2, Bias: 3.000000, T: 240, Avg. loss: 0.000000 Total training time: 0.00 seconds. -- Epoch 3 Norm: 1.56, NNZs: 2, Bias: 3.000000, T: 360, Avg. loss: 0.000000 Total training time: 0.00 seconds. -- Epoch 4 Norm: 1.56, NNZs: 2, Bias: 3.000000, T: 480, Avg. loss: 0.000000 Total training time: 0.00 seconds. -- Epoch 5 Norm: 1.56, NNZs: 2, Bias: 3.000000, T: 600, Avg. loss: 0.000000 Total training time: 0.00 seconds. -- Epoch 6 Norm: 1.56, NNZs: 2, Bias: 3.000000, T: 720, Avg. loss: 0.000000 Total training time: 0.00 seconds. -- Epoch 7 Norm: 1.56, NNZs: 2, Bias: 3.000000, T: 840, Avg. loss: 0.000000 Total training time: 0.00 seconds. Convergence after 7 epochs took 0.00 seconds
Out[4]:
Perceptron(max_iter=100, random_state=82, tol=0.01, verbose=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Perceptron(max_iter=100, random_state=82, tol=0.01, verbose=1)
Step 4 - PrĆ©dictionsĀ¶
InĀ [5]:
y_pred = pct.predict(x_test)
df=pd.DataFrame.from_dict({'Length (x1)':x_test[:,0], 'Width (x2)':x_test[:,1], 'y_test':y_test, 'y_pred':y_pred})
display(df[:15])
Length (x1) | Width (x2) | y_test | y_pred | |
---|---|---|---|---|
0 | 6.3 | 1.8 | 0 | 0 |
1 | 6.1 | 2.3 | 0 | 0 |
2 | 1.3 | 0.3 | 1 | 1 |
3 | 1.7 | 0.4 | 1 | 1 |
4 | 4.5 | 1.5 | 0 | 0 |
5 | 5.6 | 1.4 | 0 | 0 |
6 | 1.3 | 0.3 | 1 | 1 |
7 | 1.2 | 0.2 | 1 | 1 |
8 | 4.5 | 1.5 | 0 | 0 |
9 | 1.0 | 0.2 | 1 | 1 |
10 | 4.8 | 1.8 | 0 | 0 |
11 | 1.4 | 0.2 | 1 | 1 |
12 | 6.1 | 2.5 | 0 | 0 |
13 | 1.4 | 0.2 | 1 | 1 |
14 | 5.9 | 2.3 | 0 | 0 |
Step 5 - VisualisationĀ¶
InĀ [6]:
def plot_perceptron(x_train,y_train,x_test,y_test):
a = -pct.coef_[0][0] / pct.coef_[0][1]
b = -pct.intercept_ / pct.coef_[0][1]
box=[x.min(axis=0)[0],x.max(axis=0)[0],x.min(axis=0)[1],x.max(axis=0)[1]]
mx=(box[1]-box[0])/20
my=(box[3]-box[2])/20
box=[box[0]-mx,box[1]+mx,box[2]-my,box[3]+my]
fig, axs = plt.subplots(1, 1)
fig.set_size_inches(10,6)
axs.plot(x_train[y_train==1, 0], x_train[y_train==1, 1], "o", color='tomato', label="Iris-Setosa")
axs.plot(x_train[y_train==0, 0], x_train[y_train==0, 1], "o", color='steelblue',label="Autres")
axs.plot(x_test[y_pred==1, 0], x_test[y_pred==1, 1], "o", color='lightsalmon', label="Iris-Setosa (pred)")
axs.plot(x_test[y_pred==0, 0], x_test[y_pred==0, 1], "o", color='lightblue', label="Autres (pred)")
axs.plot([box[0], box[1]], [a*box[0]+b, a*box[1]+b], "k--", linewidth=2)
axs.set_xlabel("Petal length (cm)", labelpad=15) #, fontsize=14)
axs.set_ylabel("Petal width (cm)", labelpad=15) #, fontsize=14)
axs.legend(loc="lower right", fontsize=14)
axs.set_xlim(box[0],box[1])
axs.set_ylim(box[2],box[3])
fidle.scrawler.save_fig('01-perceptron-iris')
plt.show()
plot_perceptron(x_train,y_train, x_test,y_test)
Saved: ./run/PER57/figs/01-perceptron-iris
InĀ [7]:
fidle.end()
End time : 03/03/24 21:03:21
Duration : 00:00:01 876ms
This notebook ends here :-)
https://fidle.cnrs.fr