[파이썬] Gensim에서의 Knowledge Graph Embeddings

Knowledge Graph Embeddings have become an essential tool in natural language processing and machine learning, allowing us to represent structured knowledge in a continuous, vectorized form. One popular library for generating these embeddings is Gensim, a powerful Python library for topic modeling and document similarity analysis.

In this blog post, we will explore how to use Gensim to generate knowledge graph embeddings from a given knowledge graph.

Installation

First, let’s install Gensim using pip:

pip install gensim

Loading the Knowledge Graph

To start with, we need to load our knowledge graph into Gensim. The graph can be in any format - RDF, CSV, or even a custom representation. Gensim provides a unified API to handle different graph formats.

For example, if we have a knowledge graph in RDF Turtle format, we can load it using the MmCorpus class:

from gensim.corpora import MmCorpus

corpus = MmCorpus('knowledge_graph.ttl')

Training the Embeddings

Once we have loaded our knowledge graph, we can train the embeddings using the Word2Vec model provided by Gensim. This model is typically used for word embeddings, but it can be adapted for knowledge graph embeddings as well.

from gensim.models import Word2Vec

model = Word2Vec(corpus, size=100, window=5, min_count=1, workers=4)

In the above code, we specify the size of the embedding vectors (size=100), the window size for context words (window=5), the minimum count for a word to be included in the vocabulary (min_count=1), and the number of parallel workers for training (workers=4).

Evaluating the Embeddings

To evaluate the quality of the knowledge graph embeddings, we can use various evaluation metrics such as mean average precision and precision at K. Gensim provides a evaluate_word_pairs function to evaluate the embeddings on a given word similarity dataset.

from gensim.test.utils import datapath

similarity_dataset = datapath('similarity_dataset.txt')
result = model.wv.evaluate_word_pairs(datapath)
print(result)

Using the Embeddings

Once we have trained the knowledge graph embeddings, we can use them for various tasks such as entity similarity, relation classification, and link prediction.

entity_vector = model.wv['entity']
similar_entities = model.wv.most_similar('entity')
relation_vector = model.wv['relation']

In the above code, entity_vector represents the vector representation of an entity, similar_entities gives the most similar entities to a given entity, and relation_vector represents the vector representation of a relation.

Conclusion

In this blog post, we have seen how to use Gensim to generate knowledge graph embeddings from a given knowledge graph. Gensim provides a simple and efficient API for loading and training the embeddings, as well as evaluating their quality.

Knowledge graph embeddings have a wide range of applications in natural language processing and machine learning, and Gensim makes it easy to incorporate them into your own projects.