[K3IMDB3] - Reload and reuse a saved model¶
Retrieving a saved model to perform a sentiment analysis (movie review), using Keras 3 and PyTorchObjectives :¶
- The objective is to guess whether our personal film reviews are positive or negative based on the analysis of the text.
- For this, we will use our previously saved model.
What we're going to do :¶
- Preparing our data
- Retrieve our saved model
- Evaluate the result
Step 1 - Init python stuff¶
import os
os.environ['KERAS_BACKEND'] = 'torch'
import keras
import json,re
import numpy as np
import fidle
# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('K3IMDB3')
FIDLE - Environment initialization
Version : 2.3.0 Run id : K3IMDB3 Run dir : ./run/K3IMDB3 Datasets dir : /gpfswork/rech/mlh/uja62cb/fidle-project/datasets-fidle Start time : 03/03/24 21:11:06 Hostname : r6i0n6 (Linux) Tensorflow log level : Warning + Error (=1) Update keras cache : False Update torch cache : False Save figs : ./run/K3IMDB3/figs (True) keras : 3.0.4 numpy : 1.24.4 sklearn : 1.3.2 yaml : 6.0.1 matplotlib : 3.8.2 pandas : 2.1.3 torch : 2.1.1
1.2 - Parameters¶
The words in the vocabulary are classified from the most frequent to the rarest.
vocab_size
is the number of words we will remember in our vocabulary (the other words will be considered as unknown).
review_len
is the review length
saved_models
where our models were previously saved
dictionaries_dir
is where we will go to save our dictionaries. (./data is a good choice)
vocab_size = 10000
review_len = 256
saved_models = './run/K3IMDB2'
dictionaries_dir = './data'
Override parameters (batch mode) - Just forget this cell
fidle.override('vocab_size', 'review_len', 'saved_models', 'dictionaries_dir')
reviews = [ "This film is particularly nice, a must see.",
"This film is a great classic that cannot be ignored.",
"I don't remember ever having seen such a movie...",
"This movie is just abominable and doesn't deserve to be seen!"]
2.2 - Retrieve dictionaries¶
Note : This dictionary is generated by 02-Embedding-Keras notebook.
with open(f'{dictionaries_dir}/word_index.json', 'r') as fp:
word_index = json.load(fp)
index_word = { i:w for w,i in word_index.items() }
print('Dictionaries loaded. ', len(word_index), 'entries' )
Dictionaries loaded. 88588 entries
2.3 - Clean, index and padd¶
Phases are split into words, punctuation is removed, sentence length is limited and padding is added...
Note : 1 is "Start" and 2 is "unknown"
start_char = 1 # Start of a sequence (padding is 0)
oov_char = 2 # Out-of-vocabulary
index_from = 3 # First word id
nb_reviews = len(reviews)
x_data = []
# ---- For all reviews
for review in reviews:
print('Words are : ', end='')
# ---- First index must be <start>
index_review=[start_char]
print(f'{start_char} ', end='')
# ---- For all words
for w in review.split(' '):
# ---- Clean it
w_clean = re.sub(r"[^a-zA-Z0-9]", "", w)
# ---- Not empty ?
if len(w_clean)>0:
# ---- Get the index - must be inside dict or is out of vocab (oov)
w_index = word_index.get(w, oov_char)
if w_index>vocab_size : w_index=oov_char
# ---- Add the index if < vocab_size
index_review.append(w_index)
print(f'{w_index} ', end='')
# ---- Add the indexed review
x_data.append(index_review)
print()
# ---- Padding
x_data = keras.preprocessing.sequence.pad_sequences(x_data, value = 0, padding = 'post', maxlen = review_len)
Words are : 1 2 22 9 572 2 6 215 2 Words are : 1 2 22 9 6 87 356 15 566 30 2 Words are : 1 2 92 377 126 260 110 141 6 2 Words are : 1 2 20 9 43 2 5 152 1833 8 30 2
2.4 - Have a look¶
def translate(x):
return ' '.join( [index_word.get(i,'?') for i in x] )
for i in range(nb_reviews):
imax=np.where(x_data[i]==0)[0][0]+5
print(f'\nText review {i} :', reviews[i])
print(f'tokens vector :', list(x_data[i][:imax]), '(...)')
print('Translation :', translate(x_data[i][:imax]), '(...)')
Text review 0 : This film is particularly nice, a must see. tokens vector : [1, 2, 22, 9, 572, 2, 6, 215, 2, 0, 0, 0, 0, 0] (...) Translation : <start> <unknown> film is particularly <unknown> a must <unknown> <pad> <pad> <pad> <pad> <pad> (...) Text review 1 : This film is a great classic that cannot be ignored. tokens vector : [1, 2, 22, 9, 6, 87, 356, 15, 566, 30, 2, 0, 0, 0, 0, 0] (...) Translation : <start> <unknown> film is a great classic that cannot be <unknown> <pad> <pad> <pad> <pad> <pad> (...) Text review 2 : I don't remember ever having seen such a movie... tokens vector : [1, 2, 92, 377, 126, 260, 110, 141, 6, 2, 0, 0, 0, 0, 0] (...) Translation : <start> <unknown> don't remember ever having seen such a <unknown> <pad> <pad> <pad> <pad> <pad> (...) Text review 3 : This movie is just abominable and doesn't deserve to be seen! tokens vector : [1, 2, 20, 9, 43, 2, 5, 152, 1833, 8, 30, 2, 0, 0, 0, 0, 0] (...) Translation : <start> <unknown> movie is just <unknown> and doesn't deserve to be <unknown> <pad> <pad> <pad> <pad> <pad> (...)
Step 3 - Bring back the model¶
model = keras.models.load_model(f'{saved_models}/models/best_model.keras')
Step 4 - Predict¶
y_pred = model.predict(x_data, verbose=0)
And the winner is :¶
for i,review in enumerate(reviews):
rate = y_pred[i][0]
opinion = 'NEGATIVE :-(' if rate<0.5 else 'POSITIVE :-)'
print(f'{review:<70} => {rate:.2f} - {opinion}')
This film is particularly nice, a must see. => 0.59 - POSITIVE :-) This film is a great classic that cannot be ignored. => 0.74 - POSITIVE :-) I don't remember ever having seen such a movie... => 0.57 - POSITIVE :-) This movie is just abominable and doesn't deserve to be seen! => 0.38 - NEGATIVE :-(
fidle.end()
End time : 03/03/24 21:11:07
Duration : 00:00:01 509ms
This notebook ends here :-)
https://fidle.cnrs.fr