No description has been provided for this image

[K3IMDB4] - Reload embedded vectors¶

Retrieving embedded vectors from our trained model, using Keras 3 and PyTorch

Objectives :¶

  • The objective is to retrieve and visualize our embedded vectors
  • For this, we will use our previously saved model.

What we're going to do :¶

  • Retrieve our saved model
  • Extract vectors and play with

Step 1 - Init python stuff¶

In [1]:
import os
os.environ['KERAS_BACKEND'] = 'torch'

import keras

import json,re
import numpy as np

import fidle

# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('K3IMDB4')


FIDLE - Environment initialization

Version              : 2.3.2
Run id               : K3IMDB4
Run dir              : ./run/K3IMDB4
Datasets dir         : /lustre/fswork/projects/rech/mlh/uja62cb/fidle-project/datasets-fidle
Start time           : 22/12/24 21:23:27
Hostname             : r3i6n0 (Linux)
Tensorflow log level : Info + Warning + Error  (=0)
Update keras cache   : False
Update torch cache   : False
Save figs            : ./run/K3IMDB4/figs (True)
keras                : 3.7.0
numpy                : 2.1.2
sklearn              : 1.5.2
yaml                 : 6.0.2
matplotlib           : 3.9.2
pandas               : 2.2.3
torch                : 2.5.0

1.2 - Parameters¶

The words in the vocabulary are classified from the most frequent to the rarest.
vocab_size is the number of words we will remember in our vocabulary (the other words will be considered as unknown).
review_len is the review length
saved_models where our models were previously saved
dictionaries_dir is where we will go to save our dictionaries. (./data is a good choice)

In [2]:
vocab_size           = 5000
review_len           = 256

saved_models         = './run/K3IMDB2'
dictionaries_dir     = './data'

Override parameters (batch mode) - Just forget this cell

In [3]:
fidle.override('vocab_size', 'review_len', 'saved_models', 'dictionaries_dir')

Step 2 - Get the embedding vectors !¶

2.1 - Load model and dictionaries¶

Note : This dictionary is generated by 02-Embedding-Keras notebook.

In [4]:
model = keras.models.load_model(f'{saved_models}/models/best_model.keras')
print('Model loaded.')

with open(f'{dictionaries_dir}/word_index.json', 'r') as fp:
    word_index = json.load(fp)
    index_word = { i:w      for w,i in word_index.items() }
    print('Dictionaries loaded. ', len(word_index), 'entries' )
Model loaded.
Dictionaries loaded.  88588 entries

2.2 - Retrieve embeddings¶

In [5]:
embeddings = model.layers[0].get_weights()[0]
print('Shape of embeddings : ',embeddings.shape)
Shape of embeddings :  (5000, 32)

2.3 - Build a nice dictionary¶

In [6]:
word_embedding = { index_word[i]:embeddings[i] for i in range(vocab_size) }

Step 3 - Have a look !¶

Show embedding of a word :¶

In [7]:
word_embedding['nice']
Out[7]:
array([ 0.21260725,  0.16411522,  0.20545849, -0.14136912, -0.2018573 ,
       -0.18842438,  0.20927402, -0.18139772,  0.13205685,  0.1944659 ,
        0.12543829,  0.13478456,  0.16418412,  0.21914196, -0.21495722,
       -0.17776752,  0.23770906,  0.20715123,  0.19914348,  0.18577246,
       -0.14190526,  0.22035922,  0.19699118,  0.13939948, -0.22374831,
        0.21577328, -0.14003031, -0.19838649,  0.16246769,  0.15905133,
       -0.1445778 ,  0.16018525], dtype=float32)

Few usefull functions to play with¶

In [8]:
# Return a l2 distance between 2 words
#
def l2w(w1,w2):
    v1=word_embedding[w1]
    v2=word_embedding[w2]
    return np.linalg.norm(v2-v1)

# Show distance between 2 words 
#
def show_l2(w1,w2):
    print(f'\nL2 between [{w1}] and [{w2}] : ',l2w(w1,w2))

# Displays the 15 closest words to a given word
#
def neighbors(w1):
    v1=word_embedding[w1]
    dd={}
    for i in range(4, 1000):
        w2=index_word[i]
        dd[w2]=l2w(w1,w2)
    dd= {k: v for k, v in sorted(dd.items(), key=lambda item: item[1])}
    print(f'\nNeighbors of [{w1}] : ', list(dd.keys())[1:15])
    

Examples¶

In [9]:
show_l2('nice', 'pleasant')
show_l2('nice', 'horrible')

neighbors('horrible')
neighbors('great')
L2 between [nice] and [pleasant] :  0.70260125

L2 between [nice] and [horrible] :  4.035369

Neighbors of [horrible] :  ['avoid', 'badly', 'annoying', 'save', 'ridiculous', 'worse', 'terrible', 'dull', 'poor', 'mess', 'predictable', 'fails', 'boring', 'lame']

Neighbors of [great] :  ['definitely', 'brilliant', '9', 'enjoyable', 'enjoyed', 'loved', 'surprised', 'fantastic', 'wonderful', 'masterpiece', 'highly', 'fun', 'amazing', 'superb']
In [10]:
fidle.end()

End time : 22/12/24 21:23:27
Duration : 00:00:00 387ms
This notebook ends here :-)
https://fidle.cnrs.fr


No description has been provided for this image