TextBlob is a Python library that provides a simple and intuitive API for natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and n-gram extraction.
In this blog post, we will focus on n-gram extraction using TextBlob in Python. N-grams are contiguous sequences of n tokens from a given text. They are widely used in text analysis, language modeling, and machine learning tasks.
Installing TextBlob
Before we start with n-gram extraction, let’s first install TextBlob. Open your terminal and run the following command:
pip install textblob
Tokenization with TextBlob
In order to extract n-grams from a text, we need to first tokenize the text into individual tokens. Tokenization is the process of splitting a text into meaningful units or tokens.
Let’s see how we can tokenize a sentence using TextBlob:
from textblob import TextBlob
sentence = "TextBlob is a great library for natural language processing"
blob = TextBlob(sentence)
tokens = blob.words
print(tokens)
Output:
['TextBlob', 'is', 'a', 'great', 'library', 'for', 'natural', 'language', 'processing']
Extracting n-grams with TextBlob
TextBlob provides a simple method called ngrams
to extract n-grams from a list of tokens. Let’s see an example of extracting bigrams (2-grams) from a sentence:
from textblob import TextBlob
sentence = "TextBlob is a great library for natural language processing"
blob = TextBlob(sentence)
bigrams = blob.ngrams(n=2)
print(bigrams)
Output:
[WordList(['TextBlob', 'is']), WordList(['is', 'a']), WordList(['a', 'great']), WordList(['great', 'library']), WordList(['library', 'for']), WordList(['for', 'natural']), WordList(['natural', 'language']), WordList(['language', 'processing'])]
As you can see, the ngrams
method returns a list of WordLists, where each WordList represents an n-gram.
Conclusion
In this blog post, we have explored how to perform n-gram extraction using TextBlob in Python. N-grams are useful for various text analysis tasks, including language modeling, sentiment analysis, and machine translation. TextBlob provides a convenient and easy-to-use API for performing n-gram extraction, making it a valuable tool for natural language processing tasks.