[파이썬] 품사 태깅(Part-of-Speech Tagging)

04 Sep 2023

python

Part-of-Speech (POS) tagging is a process of classifying words into their respective parts of speech (e.g., noun, verb, adjective, etc.) based on their context and relationship with other words in a sentence. It is an essential step in natural language processing (NLP) tasks such as language understanding, text analysis, and information extraction.

Python provides several libraries and tools for performing POS tagging, including NLTK, SpaCy, and TextBlob. These libraries utilize machine learning algorithms and pre-trained models to tag words with their POS labels.

In this blog post, we will focus on using the NLTK library to perform POS tagging in Python. NLTK is a popular library for NLP tasks and provides various functions and resources for text processing.

Installing NLTK

To begin with, make sure you have NLTK installed on your local machine. Open a terminal or command prompt and run the following command:

pip install nltk

Tokenization and POS Tagging with NLTK

Once NLTK is installed, we can proceed with tokenizing a sentence and performing POS tagging. Tokenization is the process of splitting a text into individual words or tokens.

import nltk

# Download required resources for POS tagging
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Example sentence
sentence = "I love using NLTK for natural language processing."

# Tokenize the sentence
tokens = nltk.word_tokenize(sentence)

# Perform POS tagging
pos_tags = nltk.pos_tag(tokens)

# Print the POS tagged words
print(pos_tags)

Running the above code will output the following:

[('I', 'PRP'), ('love', 'VBP'), ('using', 'VBG'), ('NLTK', 'NNP'), ('for', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('.', '.')]

Each word in the sentence is now associated with a POS label. For example, ‘I’ is tagged as ‘PRP’ (personal pronoun), ‘love’ is tagged as ‘VBP’ (verb, non-3rd person singular present), ‘using’ is tagged as ‘VBG’ (verb, gerund or present participle), and so on.

Conclusion

Part-of-Speech (POS) tagging is a fundamental step in natural language processing and helps in extracting meaningful information from text data. Python provides several libraries, such as NLTK, for performing POS tagging efficiently.

In this blog post, we explored how to perform POS tagging using NLTK in Python. We covered the installation process, tokenization, and POS tagging code examples.

By understanding the context and relationship of words, POS tagging enables us to build more advanced NLP applications like sentiment analysis, text classification, and information retrieval.

Remember to experiment with different texts and explore the various POS tags to gain a better understanding of this important NLP concept.