[K3IMDB3] - Reload and reuse a saved model¶
Retrieving a saved model to perform a sentiment analysis (movie review), using Keras 3 and PyTorchObjectives :¶
- The objective is to guess whether our personal film reviews are positive or negative based on the analysis of the text.
- For this, we will use our previously saved model.
What we're going to do :¶
- Preparing our data
- Retrieve our saved model
- Evaluate the result
Step 1 - Init python stuff¶
import os
os.environ['KERAS_BACKEND'] = 'torch'
import keras
import json,re
import numpy as np
import fidle
# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('K3IMDB3')
1.2 - Parameters¶
The words in the vocabulary are classified from the most frequent to the rarest.
is the number of words we will remember in our vocabulary (the other words will be considered as unknown).
is the review length
where our models were previously saved
is where we will go to save our dictionaries. (./data is a good choice)
vocab_size = 10000
review_len = 256
saved_models = './run/K3IMDB2'
dictionaries_dir = './data'
fidle.override('vocab_size', 'review_len', 'saved_models', 'dictionaries_dir')
reviews = [ "This film is particularly nice, a must see.",
"This film is a great classic that cannot be ignored.",
"I don't remember ever having seen such a movie...",
"This movie is just abominable and doesn't deserve to be seen!"]
2.2 - Retrieve dictionaries¶
Note : This dictionary is generated by 02-Embedding-Keras notebook.
with open(f'{dictionaries_dir}/word_index.json', 'r') as fp:
word_index = json.load(fp)
index_word = { i:w for w,i in word_index.items() }
print('Dictionaries loaded. ', len(word_index), 'entries' )
Dictionaries loaded. 88588 entries
2.3 - Clean, index and padd¶
Phases are split into words, punctuation is removed, sentence length is limited and padding is added...
Note : 1 is "Start" and 2 is "unknown"
start_char = 1 # Start of a sequence (padding is 0)
oov_char = 2 # Out-of-vocabulary
index_from = 3 # First word id
nb_reviews = len(reviews)
x_data = []
# ---- For all reviews
for review in reviews:
print('Words are : ', end='')
# ---- First index must be <start>
print(f'{start_char} ', end='')
# ---- For all words
for w in review.split(' '):
# ---- Clean it
w_clean = re.sub(r"[^a-zA-Z0-9]", "", w)
# ---- Not empty ?
if len(w_clean)>0:
# ---- Get the index - must be inside dict or is out of vocab (oov)
w_index = word_index.get(w, oov_char)
if w_index>vocab_size : w_index=oov_char
# ---- Add the index if < vocab_size
print(f'{w_index} ', end='')
# ---- Add the indexed review
# ---- Padding
x_data = keras.preprocessing.sequence.pad_sequences(x_data, value = 0, padding = 'post', maxlen = review_len)
Words are : 1 2 22 9 572 2 6 215 2 Words are : 1 2 22 9 6 87 356 15 566 30 2 Words are : 1 2 92 377 126 260 110 141 6 2 Words are : 1 2 20 9 43 2 5 152 1833 8 30 2
2.4 - Have a look¶
def translate(x):
return ' '.join( [index_word.get(i,'?') for i in x] )
for i in range(nb_reviews):
print(f'\nText review {i} :', reviews[i])
print(f'tokens vector :', list(x_data[i][:imax]), '(...)')
print('Translation :', translate(x_data[i][:imax]), '(...)')
Text review 0 : This film is particularly nice, a must see. tokens vector : [np.int32(1), np.int32(2), np.int32(22), np.int32(9), np.int32(572), np.int32(2), np.int32(6), np.int32(215), np.int32(2), np.int32(0), np.int32(0), np.int32(0), np.int32(0), np.int32(0)] (...) Translation : <start> <unknown> film is particularly <unknown> a must <unknown> <pad> <pad> <pad> <pad> <pad> (...) Text review 1 : This film is a great classic that cannot be ignored. tokens vector : [np.int32(1), np.int32(2), np.int32(22), np.int32(9), np.int32(6), np.int32(87), np.int32(356), np.int32(15), np.int32(566), np.int32(30), np.int32(2), np.int32(0), np.int32(0), np.int32(0), np.int32(0), np.int32(0)] (...) Translation : <start> <unknown> film is a great classic that cannot be <unknown> <pad> <pad> <pad> <pad> <pad> (...) Text review 2 : I don't remember ever having seen such a movie... tokens vector : [np.int32(1), np.int32(2), np.int32(92), np.int32(377), np.int32(126), np.int32(260), np.int32(110), np.int32(141), np.int32(6), np.int32(2), np.int32(0), np.int32(0), np.int32(0), np.int32(0), np.int32(0)] (...) Translation : <start> <unknown> don't remember ever having seen such a <unknown> <pad> <pad> <pad> <pad> <pad> (...) Text review 3 : This movie is just abominable and doesn't deserve to be seen! tokens vector : [np.int32(1), np.int32(2), np.int32(20), np.int32(9), np.int32(43), np.int32(2), np.int32(5), np.int32(152), np.int32(1833), np.int32(8), np.int32(30), np.int32(2), np.int32(0), np.int32(0), np.int32(0), np.int32(0), np.int32(0)] (...) Translation : <start> <unknown> movie is just <unknown> and doesn't deserve to be <unknown> <pad> <pad> <pad> <pad> <pad> (...)
Step 3 - Bring back the model¶
model = keras.models.load_model(f'{saved_models}/models/best_model.keras')
Step 4 - Predict¶
y_pred = model.predict(x_data, verbose=0)
And the winner is :¶
for i,review in enumerate(reviews):
rate = y_pred[i][0]
opinion = 'NEGATIVE :-(' if rate<0.5 else 'POSITIVE :-)'
print(f'{review:<70} => {rate:.2f} - {opinion}')
This film is particularly nice, a must see. => 0.52 - POSITIVE :-) This film is a great classic that cannot be ignored. => 0.68 - POSITIVE :-) I don't remember ever having seen such a movie... => 0.49 - NEGATIVE :-( This movie is just abominable and doesn't deserve to be seen! => 0.31 - NEGATIVE :-(
