No description has been provided for this image

[POLR1] - Complexity Syndrome¶

Illustration of the problem of complexity with the polynomial regression

Objectives :¶

  • Visualizing and understanding under and overfitting

What we're going to do :¶

We are looking for a polynomial function to approximate the observed series :
$ y = a_n\cdot x^n + \dots + a_i\cdot x^i + \dots + a_1\cdot x + b $

Step 1 - Import and init¶

In [1]:
import numpy as np
import math
import random
import matplotlib
import matplotlib.pyplot as plt
import sys
import fidle

# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('POLR1')


FIDLE - Environment initialization

Version              : 2.3.2
Run id               : POLR1
Run dir              : ./run/POLR1
Datasets dir         : /lustre/fswork/projects/rech/mlh/uja62cb/fidle-project/datasets-fidle
Start time           : 22/12/24 21:20:41
Hostname             : r3i6n3 (Linux)
Tensorflow log level : Info + Warning + Error  (=0)
Update keras cache   : False
Update torch cache   : False
Save figs            : ./run/POLR1/figs (True)
numpy                : 2.1.2
sklearn              : 1.5.2
yaml                 : 6.0.2
matplotlib           : 3.9.2
pandas               : 2.2.3

Step 2 - Dataset generation¶

In [2]:
# ---- Parameters

n         = 100

xob_min   = -5
xob_max   = 5

deg       =  7
a_min     = -2
a_max     =  2

noise     =  2000

# ---- Train data
#      X,Y              : data
#      X_norm,Y_norm    : normalized data

X = np.random.uniform(xob_min,xob_max,(n,1))
# N = np.random.uniform(-noise,noise,(n,1))
N = noise * np.random.normal(0,1,(n,1))

a = np.random.uniform(a_min,a_max, (deg,))
fy = np.poly1d( a )

Y = fy(X) + N

# ---- Data normalization
#
X_norm = (X - X.mean(axis=0)) / X.std(axis=0)
Y_norm = (Y - Y.mean(axis=0)) / Y.std(axis=0)

# ---- Data visualization

width = 12
height = 6
nb_viz = min(2000,n)

def vector_infos(name,V):
    m=V.mean(axis=0).item()
    s=V.std(axis=0).item()
    print("{:8} :      mean={:+12.4f}  std={:+12.4f}    min={:+12.4f}    max={:+12.4f}".format(name,m,s,V.min(),V.max()))


fidle.utils.display_md('#### Generator :')
print(f"Nomber of points={n}  deg={deg} bruit={noise}")

fidle.utils.display_md('#### Datasets :')
print(f"{nb_viz} points visibles sur {n})")
plt.figure(figsize=(width, height))
plt.plot(X[:nb_viz], Y[:nb_viz], '.')
plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
plt.xlabel('x axis')
plt.ylabel('y axis')
fidle.scrawler.save_fig("01-dataset")
plt.show()

fidle.utils.display_md('#### Before normalization :')
vector_infos('X',X)
vector_infos('Y',Y)

fidle.utils.display_md('#### After normalization :')         
vector_infos('X_norm',X_norm)
vector_infos('Y_norm',Y_norm)

Generator :¶

Nomber of points=100  deg=7 bruit=2000

Datasets :¶

100 points visibles sur 100)
Saved: ./run/POLR1/figs/01-dataset
No description has been provided for this image

Before normalization :¶

X        :      mean=     -0.1770  std=     +3.0177    min=     -4.9328    max=     +4.9357
Y        :      mean=   -497.9644  std=  +1945.6850    min=  -6883.6392    max=  +3428.4071

After normalization :¶

X_norm   :      mean=     -0.0000  std=     +1.0000    min=     -1.5760    max=     +1.6943
Y_norm   :      mean=     -0.0000  std=     +1.0000    min=     -3.2820    max=     +2.0180

Step 3 - Polynomial regression with NumPy¶

3.1 - Underfitting¶

In [3]:
def draw_reg(X_norm, Y_norm, x_hat,fy_hat, size, save_as):
    plt.figure(figsize=size)
    plt.plot(X_norm, Y_norm, '.')

    x_hat = np.linspace(X_norm.min(), X_norm.max(), 100)

    plt.plot(x_hat, fy_hat(x_hat))
    plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
    plt.xlabel('x axis')
    plt.ylabel('y axis')
    fidle.scrawler.save_fig(save_as)
    plt.show()
In [4]:
reg_deg=1

a_hat   = np.polyfit(X_norm.reshape(-1,), Y_norm.reshape(-1,), reg_deg)
fy_hat  = np.poly1d( a_hat )

print(f'Nombre de degrés : {reg_deg}')
draw_reg(X_norm[:nb_viz],Y_norm[:nb_viz], X_norm,fy_hat, (width,height), save_as='02-underfitting')
Nombre de degrés : 1
Saved: ./run/POLR1/figs/02-underfitting
No description has been provided for this image

3.2 - Good fitting¶

In [5]:
reg_deg=5

a_hat   = np.polyfit(X_norm.reshape(-1,), Y_norm.reshape(-1,), reg_deg)
fy_hat  = np.poly1d( a_hat )

print(f'Nombre de degrés : {reg_deg}')
draw_reg(X_norm[:nb_viz],Y_norm[:nb_viz], X_norm,fy_hat, (width,height), save_as='03-good_fitting')
Nombre de degrés : 5
Saved: ./run/POLR1/figs/03-good_fitting
No description has been provided for this image

3.3 - Overfitting¶

In [6]:
reg_deg=24

a_hat   = np.polyfit(X_norm.reshape(-1,), Y_norm.reshape(-1,), reg_deg)
fy_hat  = np.poly1d( a_hat )

print(f'Nombre de degrés : {reg_deg}')
draw_reg(X_norm[:nb_viz],Y_norm[:nb_viz], X_norm,fy_hat, (width,height), save_as='04-over_fitting')
Nombre de degrés : 24
Saved: ./run/POLR1/figs/04-over_fitting
No description has been provided for this image
In [7]:
fidle.end()

End time : 22/12/24 21:20:43
Duration : 00:00:02 055ms
This notebook ends here :-)
https://fidle.cnrs.fr


No description has been provided for this image