[PBHPD1] - Regression with a Dense Network (DNN)¶
A Simple regression with a Dense Neural Network (DNN) using Pytorch - BHPD datasetObjectives :¶
- Predicts housing prices from a set of house features.
- Understanding the principle and the architecture of a regression with a dense neural network
The Boston Housing Dataset consists of price of houses in various places in Boston.
Alongside with price, the dataset also provide theses informations :
- CRIM: This is the per capita crime rate by town
- ZN: This is the proportion of residential land zoned for lots larger than 25,000 sq.ft
- INDUS: This is the proportion of non-retail business acres per town
- CHAS: This is the Charles River dummy variable (this is equal to 1 if tract bounds river; 0 otherwise)
- NOX: This is the nitric oxides concentration (parts per 10 million)
- RM: This is the average number of rooms per dwelling
- AGE: This is the proportion of owner-occupied units built prior to 1940
- DIS: This is the weighted distances to five Boston employment centers
- RAD: This is the index of accessibility to radial highways
- TAX: This is the full-value property-tax rate per 10,000 dollars
- PTRATIO: This is the pupil-teacher ratio by town
- B: This is calculated as 1000(Bk — 0.63)^2, where Bk is the proportion of people of African American descent by town
- LSTAT: This is the percentage lower status of the population
- MEDV: This is the median value of owner-occupied homes in 1000 dollars
What we're going to do :¶
- Retrieve data
- Preparing the data
- Build a model
- Train the model
- Evaluate the result
Step 1 - Import and init¶
In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt
import sys,os
import pandas as pd
from modules.fidle_pwk_additional import convergence_history_MSELoss
import fidle
# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('PBHPD1')
FIDLE - Environment initialization
Version : 2.3.2 Run id : PBHPD1 Run dir : ./run/PBHPD1 Datasets dir : /lustre/fswork/projects/rech/mlh/uja62cb/fidle-project/datasets-fidle Start time : 22/12/24 21:20:56 Hostname : r3i6n3 (Linux) Tensorflow log level : Info + Warning + Error (=0) Update keras cache : False Update torch cache : False Save figs : ./run/PBHPD1/figs (True) numpy : 2.1.2 sklearn : 1.5.2 yaml : 6.0.2 matplotlib : 3.9.2 pandas : 2.2.3 torch : 2.5.0
Step 2 - Retrieve data¶
Boston housing is a famous historic dataset, which can be get here: Boston housing datasets
In [2]:
data = pd.read_csv('./BostonHousing.csv', header=0)
display(data.head(5).style.format("{0:.2f}").set_caption("Few lines of the dataset :"))
print('Missing Data : ',data.isna().sum().sum(), ' Shape is : ', data.shape)
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | b | lstat | medv | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.01 | 18.00 | 2.31 | 0.00 | 0.54 | 6.58 | 65.20 | 4.09 | 1.00 | 296.00 | 15.30 | 396.90 | 4.98 | 24.00 |
1 | 0.03 | 0.00 | 7.07 | 0.00 | 0.47 | 6.42 | 78.90 | 4.97 | 2.00 | 242.00 | 17.80 | 396.90 | 9.14 | 21.60 |
2 | 0.03 | 0.00 | 7.07 | 0.00 | 0.47 | 7.18 | 61.10 | 4.97 | 2.00 | 242.00 | 17.80 | 392.83 | 4.03 | 34.70 |
3 | 0.03 | 0.00 | 2.18 | 0.00 | 0.46 | 7.00 | 45.80 | 6.06 | 3.00 | 222.00 | 18.70 | 394.63 | 2.94 | 33.40 |
4 | 0.07 | 0.00 | 2.18 | 0.00 | 0.46 | 7.15 | 54.20 | 6.06 | 3.00 | 222.00 | 18.70 | 396.90 | 5.33 | 36.20 |
Missing Data : 0 Shape is : (506, 14)
In [3]:
# ---- Shuffle and Split => train, test
#
data_train = data.sample(frac=0.7, axis=0)
data_test = data.drop(data_train.index)
# ---- Split => x,y (medv is price)
#
x_train = data_train.drop('medv', axis=1)
y_train = data_train['medv']
x_test = data_test.drop('medv', axis=1)
y_test = data_test['medv']
print('Original data shape was : ',data.shape)
print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)
print('x_test : ',x_test.shape, 'y_test : ',y_test.shape)
Original data shape was : (506, 14) x_train : (354, 13) y_train : (354,) x_test : (152, 13) y_test : (152,)
3.2 - Data normalization¶
Note :
- All input data must be normalized, train and test.
- To do this we will subtract the mean and divide by the standard deviation.
- But test data should not be used in any way, even for normalization.
- The mean and the standard deviation will therefore only be calculated with the train data.
In [4]:
display(x_train.describe().style.format("{0:.2f}").set_caption("Before normalization :"))
mean = x_train.mean()
std = x_train.std()
x_train = (x_train - mean) / std
x_test = (x_test - mean) / std
display(x_train.describe().style.format("{0:.2f}").set_caption("After normalization :"))
display(x_train.head(5).style.format("{0:.2f}").set_caption("Few lines of the dataset :"))
x_train, y_train = np.array(x_train), np.array(y_train)
x_test, y_test = np.array(x_test), np.array(y_test)
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | b | lstat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 |
mean | 3.63 | 11.54 | 11.42 | 0.06 | 0.56 | 6.27 | 69.66 | 3.76 | 9.86 | 415.85 | 18.56 | 352.76 | 12.84 |
std | 7.97 | 24.08 | 6.89 | 0.24 | 0.11 | 0.69 | 28.43 | 2.15 | 8.94 | 170.50 | 2.13 | 96.17 | 7.12 |
min | 0.01 | 0.00 | 0.46 | 0.00 | 0.39 | 3.56 | 2.90 | 1.13 | 1.00 | 187.00 | 13.00 | 0.32 | 1.73 |
25% | 0.08 | 0.00 | 5.22 | 0.00 | 0.45 | 5.89 | 47.20 | 2.06 | 4.00 | 284.00 | 17.40 | 373.30 | 7.18 |
50% | 0.33 | 0.00 | 9.69 | 0.00 | 0.54 | 6.21 | 80.35 | 3.09 | 5.00 | 348.00 | 19.10 | 391.48 | 12.04 |
75% | 3.99 | 12.50 | 18.10 | 0.00 | 0.63 | 6.56 | 94.55 | 5.12 | 24.00 | 666.00 | 20.20 | 396.23 | 17.16 |
max | 73.53 | 100.00 | 27.74 | 1.00 | 0.87 | 8.78 | 100.00 | 12.13 | 24.00 | 711.00 | 22.00 | 396.90 | 36.98 |
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | b | lstat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 | 354.00 |
mean | 0.00 | -0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | -0.00 | -0.00 |
std | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
min | -0.45 | -0.48 | -1.59 | -0.26 | -1.52 | -3.91 | -2.35 | -1.22 | -0.99 | -1.34 | -2.61 | -3.66 | -1.56 |
25% | -0.44 | -0.48 | -0.90 | -0.26 | -0.92 | -0.55 | -0.79 | -0.79 | -0.66 | -0.77 | -0.55 | 0.21 | -0.79 |
50% | -0.41 | -0.48 | -0.25 | -0.26 | -0.17 | -0.09 | 0.38 | -0.31 | -0.54 | -0.40 | 0.25 | 0.40 | -0.11 |
75% | 0.05 | 0.04 | 0.97 | -0.26 | 0.64 | 0.42 | 0.88 | 0.63 | 1.58 | 1.47 | 0.77 | 0.45 | 0.61 |
max | 8.77 | 3.67 | 2.37 | 3.88 | 2.75 | 3.62 | 1.07 | 3.90 | 1.58 | 1.73 | 1.62 | 0.46 | 3.39 |
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | b | lstat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
453 | 0.58 | -0.48 | 0.97 | -0.26 | 1.36 | 1.62 | 1.04 | -0.61 | 1.58 | 1.47 | 0.77 | 0.24 | 0.55 |
391 | 0.21 | -0.48 | 0.97 | -0.26 | 1.25 | -0.32 | 0.45 | -0.74 | 1.58 | 1.47 | 0.77 | 0.27 | 0.83 |
161 | -0.27 | -0.48 | 1.19 | -0.26 | 0.42 | 1.76 | 0.74 | -0.83 | -0.54 | -0.08 | -1.82 | 0.23 | -1.56 |
462 | 0.38 | -0.48 | 0.97 | -0.26 | 1.36 | 0.07 | 0.47 | -0.48 | 1.58 | 1.47 | 0.77 | 0.46 | 0.16 |
157 | -0.30 | -0.48 | 1.19 | -0.26 | 0.42 | 0.97 | 0.98 | -0.88 | -0.54 | -0.08 | -1.82 | 0.11 | -1.16 |
In [5]:
class model_v1(nn.Module):
"""
Basic fully connected neural-network for tabular data
"""
def __init__(self,num_vars):
super(model_v1, self).__init__()
self.num_vars=num_vars
self.hidden1 = nn.Linear(self.num_vars, 64)
self.hidden2 = nn.Linear(64, 64)
self.hidden3 = nn.Linear(64, 1)
def forward(self, x):
x = x.view(-1,self.num_vars) #flatten the observation before using fully-connected layers
x = self.hidden1(x)
x = F.relu(x)
x = self.hidden2(x)
x = F.relu(x)
x = self.hidden3(x)
return x
In [6]:
def fit(model,X_train,Y_train,X_test,Y_test, EPOCHS = 5, BATCH_SIZE = 32):
loss = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),lr=1e-3) #lr is the learning rate
model.train()
history=convergence_history_MSELoss()
history.update(model,X_train,Y_train,X_test,Y_test)
n=X_train.shape[0] #number of observations in the training data
#stochastic gradient descent
for epoch in range(EPOCHS):
batch_start=0
epoch_shuffler=np.arange(n)
np.random.shuffle(epoch_shuffler) #remark that 'utilsData.DataLoader' could be used instead
while batch_start+BATCH_SIZE < n:
#get mini-batch observation
mini_batch_observations = epoch_shuffler[batch_start:batch_start+BATCH_SIZE]
var_X_batch = Variable(X_train[mini_batch_observations,:]).float()
var_Y_batch = Variable(Y_train[mini_batch_observations]).float()
#gradient descent step
optimizer.zero_grad() #set the parameters gradients to 0
Y_pred_batch = model(var_X_batch) #predict y with the current NN parameters
curr_loss = loss(Y_pred_batch.view(-1), var_Y_batch.view(-1)) #compute the current loss
curr_loss.backward() #compute the loss gradient w.r.t. all NN parameters
optimizer.step() #update the NN parameters
#prepare the next mini-batch of the epoch
batch_start+=BATCH_SIZE
history.update(model,X_train,Y_train,X_test,Y_test)
return history
5.2 - Get the model¶
In [7]:
model=model_v1( x_train[0,:].shape[0] )
print(model)
model_v1( (hidden1): Linear(in_features=13, out_features=64, bias=True) (hidden2): Linear(in_features=64, out_features=64, bias=True) (hidden3): Linear(in_features=64, out_features=1, bias=True) )
5.3 - Train the model¶
In [8]:
torch_x_train=torch.from_numpy(x_train)
torch_y_train=torch.from_numpy(y_train)
torch_x_test=torch.from_numpy(x_test)
torch_y_test=torch.from_numpy(y_test)
batch_size = 10
epochs = 100
history=fit(model,torch_x_train,torch_y_train,torch_x_test,torch_y_test,EPOCHS=epochs,BATCH_SIZE = batch_size)
In [9]:
var_x_test = Variable(torch_x_test).float()
var_y_test = Variable(torch_y_test).float()
y_pred = model(var_x_test)
nn_loss = nn.MSELoss()
nn_MAE_loss = nn.L1Loss()
print('x_test / loss : {:5.4f}'.format(nn_loss(y_pred.view(-1), var_y_test.view(-1)).item()))
print('x_test / mae : {:5.4f}'.format(nn_MAE_loss(y_pred.view(-1), var_y_test.view(-1)).item()))
x_test / loss : 12.4627 x_test / mae : 2.5673
6.2 - Training history¶
What was the best result during our training ?
In [10]:
df=pd.DataFrame(data=history.history)
df.describe()
Out[10]:
loss | mae | val_loss | val_mae | |
---|---|---|---|---|
count | 101.000000 | 101.000000 | 101.000000 | 101.000000 |
mean | 22.222863 | 2.635554 | 27.133244 | 3.249773 |
std | 75.132948 | 2.861175 | 83.131706 | 2.973938 |
min | 4.191096 | 1.577489 | 11.814480 | 2.523123 |
25% | 6.052614 | 1.805854 | 12.124697 | 2.599839 |
50% | 7.975610 | 2.019151 | 12.359540 | 2.654237 |
75% | 12.401917 | 2.378084 | 12.907097 | 2.718711 |
max | 582.181335 | 21.925776 | 625.663635 | 23.338125 |
In [11]:
print("min( val_mae ) : {:.4f}".format( min(history.history["val_mae"]) ) )
min( val_mae ) : 2.5231
In [12]:
fidle.scrawler.history(history, plot={'MAE' :['mae', 'val_mae'],
'LOSS':['loss','val_loss']})
Saved: ./run/PBHPD1/figs/fig_PBHPD1_00
Saved: ./run/PBHPD1/figs/fig_PBHPD1_01
Step 7 - Make a prediction¶
The data must be normalized with the parameters (mean, std) previously used.
In [13]:
my_data = [ 1.26425925, -0.48522739, 1.0436489 , -0.23112788, 1.37120745,
-2.14308942, 1.13489104, -1.06802005, 1.71189006, 1.57042287,
0.77859951, 0.14769795, 2.7585581 ]
real_price = 10.4
my_data=np.array(my_data).reshape(1,13)
In [14]:
torch_my_data=torch.from_numpy(my_data)
var_my_data = Variable(torch_my_data).float()
predictions = model( var_my_data )
print("Prediction : {:.2f} K$".format(predictions[0][0]))
print("Reality : {:.2f} K$".format(real_price))
Prediction : 9.01 K$ Reality : 10.40 K$