No description has been provided for this image

[K3AE1] - Prepare a noisy MNIST dataset¶

Episode 1: Preparation of a noisy MNIST dataset

Objectives :¶

  • Prepare a MNIST noisy dataset, usable with our denoiser autoencoder (duration : <50s)

What we're going to do :¶

  • Load original MNIST dataset
  • Adding noise, a lot !
  • Save it :-)

Step 1 - Init and set parameters¶

1.1 - Init python¶

In [1]:
import os
os.environ['KERAS_BACKEND'] = 'torch'

import keras

import numpy as np
import sys

from skimage import io
from skimage.util import random_noise

import modules.MNIST
from modules.MNIST     import MNIST

import fidle

# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('K3AE1')


FIDLE - Environment initialization

Version              : 2.3.2
Run id               : K3AE1
Run dir              : ./run/K3AE1
Datasets dir         : /lustre/fswork/projects/rech/mlh/uja62cb/fidle-project/datasets-fidle
Start time           : 22/12/24 21:23:38
Hostname             : r3i6n0 (Linux)
Tensorflow log level : Info + Warning + Error  (=0)
Update keras cache   : False
Update torch cache   : False
Save figs            : ./run/K3AE1/figs (True)
keras                : 3.7.0
numpy                : 2.1.2
sklearn              : 1.5.2
yaml                 : 6.0.2
skimage              : 0.24.0
matplotlib           : 3.9.2
pandas               : 2.2.3
torch                : 2.5.0

1.2 - Parameters¶

prepared_dataset : Filename of the future prepared dataset (example : ./data/mnist-noisy.h5)
scale : Dataset scale. 1 mean 100% of the dataset - set 0.1 for tests
progress_verbosity: Verbosity of progress bar: 0=silent, 1=progress bar, 2=One line

In [2]:
prepared_dataset   = './data/mnist-noisy.h5'
scale              = 1
progress_verbosity = 1

Override parameters (batch mode) - Just forget this cell

In [3]:
fidle.override('prepared_dataset', 'scale', 'progress_verbosity')
** Overrided parameters : **
scale                : 1
progress_verbosity   : 2

Step 2 - Get original dataset¶

We load :
clean_data : Original and clean images - This is what we will want to ontain at the output of the AE
class_data : Image classes - Useless, because the training will be unsupervised
We'll build :
noisy_data : Noisy images - These are the images that we will give as input to our AE

In [4]:
clean_data, class_data = MNIST.get_origine(scale=scale)
Dataset loaded.
Normalized.
Reshaped.
Concatenate.
x shape : (70000, 28, 28, 1)
y shape : (70000,)

Step 3 - Add noise¶

We add noise to the original images (clean_data) to obtain noisy images (noisy_data)
Need 30-40 seconds

In [5]:
def noise_it(data):
    new_data = np.copy(data)
    for i,image in enumerate(new_data):
        fidle.utils.update_progress('Add noise : ',i+1,len(data),verbosity=progress_verbosity)
        image=random_noise(image, mode='gaussian', mean=0, var=0.3)
        image=random_noise(image, mode='s&p',      amount=0.2, salt_vs_pepper=0.5)
        image=random_noise(image, mode='poisson') 
        image=random_noise(image, mode='speckle',  mean=0, var=0.1)
        new_data[i]=image
    print('Done.')
    return new_data

# ---- Add noise to input data : x_data
#
noisy_data = noise_it(clean_data)
Add noise :      [########################################] 100.0% of 70000
Done.

Step 4 - Have a look¶

In [6]:
print('Clean dataset (clean_data) : ',clean_data.shape)
print('Noisy dataset (noisy_data) : ',noisy_data.shape)

fidle.utils.subtitle("Noisy images we'll have in input (or x)")
fidle.scrawler.images(noisy_data[:5], None, indices='all', columns=5, x_size=3,y_size=3, interpolation=None, save_as='01-noisy')
fidle.utils.subtitle('Clean images we want to obtain (or y)')
fidle.scrawler.images(clean_data[:5], None, indices='all', columns=5, x_size=3,y_size=3, interpolation=None, save_as='02-original')
Clean dataset (clean_data) :  (70000, 28, 28, 1)
Noisy dataset (noisy_data) :  (70000, 28, 28, 1)


Noisy images we'll have in input (or x)

Saved: ./run/K3AE1/figs/01-noisy
No description has been provided for this image


Clean images we want to obtain (or y)

Saved: ./run/K3AE1/figs/02-original
No description has been provided for this image

Step 5 - Shuffle dataset¶

In [7]:
p = np.random.permutation(len(clean_data))
clean_data, noisy_data, class_data = clean_data[p], noisy_data[p], class_data[p]
print('Shuffled.')
Shuffled.

Step 6 - Save our prepared dataset¶

In [8]:
MNIST.save_prepared_dataset( clean_data, noisy_data, class_data, filename=prepared_dataset )
Saved.
clean_data shape is :  (70000, 28, 28, 1)
noisy_data shape is :  (70000, 28, 28, 1)
class_data shape is :  (70000,)
In [9]:
fidle.end()

End time : 22/12/24 21:24:12
Duration : 00:00:34 235ms
This notebook ends here :-)
https://fidle.cnrs.fr


No description has been provided for this image