[파이썬] textblob n-그램 추출

TextBlob is a Python library that provides a simple and intuitive API for natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and n-gram extraction.

In this blog post, we will focus on n-gram extraction using TextBlob in Python. N-grams are contiguous sequences of n tokens from a given text. They are widely used in text analysis, language modeling, and machine learning tasks.

Installing TextBlob

Before we start with n-gram extraction, let’s first install TextBlob. Open your terminal and run the following command:

pip install textblob

Tokenization with TextBlob

In order to extract n-grams from a text, we need to first tokenize the text into individual tokens. Tokenization is the process of splitting a text into meaningful units or tokens.

Let’s see how we can tokenize a sentence using TextBlob:

from textblob import TextBlob

sentence = "TextBlob is a great library for natural language processing"
blob = TextBlob(sentence)
tokens = blob.words

print(tokens)

Output:

['TextBlob', 'is', 'a', 'great', 'library', 'for', 'natural', 'language', 'processing']

Extracting n-grams with TextBlob

TextBlob provides a simple method called ngrams to extract n-grams from a list of tokens. Let’s see an example of extracting bigrams (2-grams) from a sentence:

from textblob import TextBlob

sentence = "TextBlob is a great library for natural language processing"
blob = TextBlob(sentence)
bigrams = blob.ngrams(n=2)

print(bigrams)

Output:

[WordList(['TextBlob', 'is']), WordList(['is', 'a']), WordList(['a', 'great']), WordList(['great', 'library']), WordList(['library', 'for']), WordList(['for', 'natural']), WordList(['natural', 'language']), WordList(['language', 'processing'])]

As you can see, the ngrams method returns a list of WordLists, where each WordList represents an n-gram.

Conclusion

In this blog post, we have explored how to perform n-gram extraction using TextBlob in Python. N-grams are useful for various text analysis tasks, including language modeling, sentiment analysis, and machine translation. TextBlob provides a convenient and easy-to-use API for performing n-gram extraction, making it a valuable tool for natural language processing tasks.