I have two different word vector models created using word2vec algorithm . Now issue i am facing is few words from first model is not there in second model . I want to create a third model from two different word vectors models where i can use word vectors from both models without loosing meaning and the context of word vectors.
Can I do this, and if so, how?
You could potentially translate the vectors for the words only in one model to the other model's coordinate space, using other shared words to learn a translation-function.
There's a facility to do this in recent gensim versions – see the TranslationMatrix tool. There's a demo Jupyter notebook included in the
docs/notebooks directory, viewable online at:
You'd presumably take the larger model (or whichever one is thought to be better, perhaps because it was trained on more data), and translate the smaller number of words its missing into its space. You'd use as many common-reference 'anchor' words as is practical.