[파이썬] Gensim 단어 벡터 시각화

Gensim is a popular library for natural language processing (NLP) tasks, including word vectorization. Word vectors are numerical representations of words that capture their semantic meaning. Visualizing these word vectors can help us gain insights into the relationships between words in a corpus.

In this blog post, we will explore how to visualize word vectors generated using Gensim in Python. Let’s get started!

Installing Gensim

You can install Gensim using pip:

pip install gensim

Make sure you have Python and pip installed on your system before running the above command.

Loading Word Vectors

To visualize word vectors using Gensim, we first need to load a pre-trained word vector model. Gensim supports various word vector formats such as Word2Vec, FastText, and GloVe.

Here’s an example of loading a Word2Vec model:

from gensim.models import Word2Vec

# Load the pre-trained Word2Vec model
model = Word2Vec.load("path_to_word2vec_model.bin")

Replace "path_to_word2vec_model.bin" with the actual path to your Word2Vec model file.

Visualizing Word Vectors

After loading the word vector model, we can visualize the word vectors using techniques such as t-SNE (t-Distributed Stochastic Neighbor Embedding) or PCA (Principal Component Analysis).

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

# Get word vectors for a subset of words
words = ['apple', 'banana', 'car', 'bike', 'house']
word_vectors = [model.wv[word] for word in words]

# Perform t-SNE dimensionality reduction
tsne = TSNE(n_components=2)
word_vectors_tsne = tsne.fit_transform(word_vectors)

# Plot the word vectors in 2D space
plt.figure(figsize=(10, 8))
for i, word in enumerate(words):
    x, y = word_vectors_tsne[i, 0], word_vectors_tsne[i, 1]
    plt.scatter(x, y)
    plt.annotate(word, (x, y), fontsize=12)
plt.grid(True)
plt.show()

The above code snippet visualizes word vectors for the words ‘apple’, ‘banana’, ‘car’, ‘bike’, and ‘house’ using t-SNE. You can replace the words list with any other words of your choice.

Conclusion

Visualizing word vectors can provide meaningful insights into how words are related in a corpus. In this blog post, we explored how to visualize word vectors generated using Gensim in Python. It is worth exploring different techniques and experimenting with various word vector models to gain deeper understanding and insights from your text data.

Gensim provides a powerful set of tools for word vectorization and much more. I encourage you to dive deeper into its documentation and explore its capabilities.

Happy word vector visualization with Gensim!