[POLR1] - Complexity Syndrome¶
Illustration of the problem of complexity with the polynomial regressionObjectives :¶
- Visualizing and understanding under and overfitting
What we're going to do :¶
We are looking for a polynomial function to approximate the observed series :
$ y = a_n\cdot x^n + \dots + a_i\cdot x^i + \dots + a_1\cdot x + b $
Step 1 - Import and init¶
In [1]:
import numpy as np
import math
import random
import matplotlib
import matplotlib.pyplot as plt
import sys
import fidle
# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('POLR1')
FIDLE - Environment initialization
Version : 2.3.0 Run id : POLR1 Run dir : ./run/POLR1 Datasets dir : /gpfswork/rech/mlh/uja62cb/fidle-project/datasets-fidle Start time : 03/03/24 21:03:07 Hostname : r3i6n3 (Linux) Tensorflow log level : Info + Warning + Error (=0) Update keras cache : False Update torch cache : False Save figs : ./run/POLR1/figs (True) numpy : 1.24.4 sklearn : 1.3.2 yaml : 6.0.1 matplotlib : 3.8.2 pandas : 2.1.3
Step 2 - Dataset generation¶
In [2]:
# ---- Parameters
n = 100
xob_min = -5
xob_max = 5
deg = 7
a_min = -2
a_max = 2
noise = 2000
# ---- Train data
# X,Y : data
# X_norm,Y_norm : normalized data
X = np.random.uniform(xob_min,xob_max,(n,1))
# N = np.random.uniform(-noise,noise,(n,1))
N = noise * np.random.normal(0,1,(n,1))
a = np.random.uniform(a_min,a_max, (deg,))
fy = np.poly1d( a )
Y = fy(X) + N
# ---- Data normalization
#
X_norm = (X - X.mean(axis=0)) / X.std(axis=0)
Y_norm = (Y - Y.mean(axis=0)) / Y.std(axis=0)
# ---- Data visualization
width = 12
height = 6
nb_viz = min(2000,n)
def vector_infos(name,V):
m=V.mean(axis=0).item()
s=V.std(axis=0).item()
print("{:8} : mean={:+12.4f} std={:+12.4f} min={:+12.4f} max={:+12.4f}".format(name,m,s,V.min(),V.max()))
fidle.utils.display_md('#### Generator :')
print(f"Nomber of points={n} deg={deg} bruit={noise}")
fidle.utils.display_md('#### Datasets :')
print(f"{nb_viz} points visibles sur {n})")
plt.figure(figsize=(width, height))
plt.plot(X[:nb_viz], Y[:nb_viz], '.')
plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
plt.xlabel('x axis')
plt.ylabel('y axis')
fidle.scrawler.save_fig("01-dataset")
plt.show()
fidle.utils.display_md('#### Before normalization :')
vector_infos('X',X)
vector_infos('Y',Y)
fidle.utils.display_md('#### After normalization :')
vector_infos('X_norm',X_norm)
vector_infos('Y_norm',Y_norm)
Generator :¶
Nomber of points=100 deg=7 bruit=2000
Datasets :¶
100 points visibles sur 100)
Saved: ./run/POLR1/figs/01-dataset
Before normalization :¶
X : mean= -0.0865 std= +2.7876 min= -4.9692 max= +4.9414 Y : mean= -1356.1042 std= +3646.2421 min= -14674.1734 max= +6106.4886
After normalization :¶
X_norm : mean= +0.0000 std= +1.0000 min= -1.7516 max= +1.8037 Y_norm : mean= -0.0000 std= +1.0000 min= -3.6525 max= +2.0467
In [3]:
def draw_reg(X_norm, Y_norm, x_hat,fy_hat, size, save_as):
plt.figure(figsize=size)
plt.plot(X_norm, Y_norm, '.')
x_hat = np.linspace(X_norm.min(), X_norm.max(), 100)
plt.plot(x_hat, fy_hat(x_hat))
plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
plt.xlabel('x axis')
plt.ylabel('y axis')
fidle.scrawler.save_fig(save_as)
plt.show()
In [4]:
reg_deg=1
a_hat = np.polyfit(X_norm.reshape(-1,), Y_norm.reshape(-1,), reg_deg)
fy_hat = np.poly1d( a_hat )
print(f'Nombre de degrés : {reg_deg}')
draw_reg(X_norm[:nb_viz],Y_norm[:nb_viz], X_norm,fy_hat, (width,height), save_as='02-underfitting')
Nombre de degrés : 1
Saved: ./run/POLR1/figs/02-underfitting
3.2 - Good fitting¶
In [5]:
reg_deg=5
a_hat = np.polyfit(X_norm.reshape(-1,), Y_norm.reshape(-1,), reg_deg)
fy_hat = np.poly1d( a_hat )
print(f'Nombre de degrés : {reg_deg}')
draw_reg(X_norm[:nb_viz],Y_norm[:nb_viz], X_norm,fy_hat, (width,height), save_as='03-good_fitting')
Nombre de degrés : 5
Saved: ./run/POLR1/figs/03-good_fitting
3.3 - Overfitting¶
In [6]:
reg_deg=24
a_hat = np.polyfit(X_norm.reshape(-1,), Y_norm.reshape(-1,), reg_deg)
fy_hat = np.poly1d( a_hat )
print(f'Nombre de degrés : {reg_deg}')
draw_reg(X_norm[:nb_viz],Y_norm[:nb_viz], X_norm,fy_hat, (width,height), save_as='04-over_fitting')
Nombre de degrés : 24
Saved: ./run/POLR1/figs/04-over_fitting
In [7]:
fidle.end()
End time : 03/03/24 21:03:09
Duration : 00:00:02 122ms
This notebook ends here :-)
https://fidle.cnrs.fr