[파이썬] nltk 자연어 인터페이스 구축

06 Sep 2023

nltk

Natural Language Processing (NLP) involves the interaction between computers and humans using natural language. With the nltk library, we can easily build a natural language interface in Python. In this blog post, we will explore the steps to set up and start using NLTK for NLP tasks.

1. Installing NLTK

To begin, we need to install the nltk library. Open your terminal or command prompt and run the following command:

pip install nltk

2. Importing NLTK

After installation, we can import the nltk library in our Python script as follows:

import nltk

3. Downloading NLTK Resources

NLTK provides various resources such as corpora, tokenizers, stemmers, and parsers. We need to download these resources before using them. To download all resources, open a Python shell and run the following commands:

import nltk
nltk.download('all')

This will download all the necessary resources to your local machine.

4. Tokenization

Tokenization is the process of breaking text into individual words or sentences. NLTK provides different tokenizers, such as word tokenizer and sentence tokenizer. Here’s an example of tokenizing a sentence:

from nltk.tokenize import sent_tokenize

text = "This is a sample sentence. It will be tokenized into sentences."
sentences = sent_tokenize(text)
print(sentences)

The output will be a list of sentences:

['This is a sample sentence.', 'It will be tokenized into sentences.']

5. Part-of-Speech Tagging

Part-of-Speech (POS) tagging is the process of assigning grammatical tags to words in a sentence. NLTK provides a pre-trained POS tagger that we can use. Here’s an example:

from nltk import pos_tag
from nltk.tokenize import word_tokenize

text = "This is a sample sentence."
words = word_tokenize(text)
tags = pos_tag(words)
print(tags)

The output will be a list of tuples, where each tuple contains a word and its corresponding POS tag:

[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('sample', 'JJ'), ('sentence', 'NN'), ('.', '.')]

6. Lemmatization

Lemmatization is the process of reducing a word to its base or root form. NLTK provides a lemmatizer that we can use for this purpose. Here’s an example:

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
word = "running"
lemma = lemmatizer.lemmatize(word)
print(lemma)

The output will be the base form of the word:

run

7. Sentiment Analysis

Sentiment analysis involves determining the sentiment or emotions expressed in a piece of text. NLTK provides a pre-trained sentiment analyzer called VADER (Valence Aware Dictionary and sEntiment Reasoner). Here’s an example of using VADER for sentiment analysis:

from nltk.sentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()
text = "I love this product. It's amazing!"
sentiment = analyzer.polarity_scores(text)
print(sentiment)

The output will be a dictionary with sentiment scores:

{'neg': 0.0, 'neu': 0.187, 'pos': 0.813, 'compound': 0.5859}

Conclusion

In this blog post, we explored how to set up and use NLTK for natural language processing tasks in Python. We covered tokenization, part-of-speech tagging, lemmatization, and sentiment analysis. With NLTK, you can easily build powerful natural language interfaces and perform various NLP tasks efficiently.