Lemmatization is the process of reducing a word to its base or root form. It helps in standardizing various forms of words and simplifying text analysis. In this blog post, we will explore how to perform lemmatization using the Natural Language Toolkit (NLTK) library in Python.
Installation
Before we start, make sure you have NLTK installed on your system. You can install it using the following command:
pip install nltk
Importing the necessary libraries
We need to import the nltk
library and download the WordNet corpus, which is required for lemmatization.
import nltk
nltk.download('wordnet')
Lemmatization using NLTK
NLTK provides a class called WordNetLemmatizer
which can be used for lemmatization. Let’s see how to use it:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
word = "running"
lemma = lemmatizer.lemmatize(word, pos='v')
print("Word:", word)
print("Lemma:", lemma)
In the above example, we create an instance of the WordNetLemmatizer
class. We then pass the word “running” and the part of speech tag (‘v’ for verb in this case) to the lemmatize
method. The method returns the base form of the word, which in this case is “run”.
Lemmatization with different part of speech tags
NLTK supports lemmatization with different part of speech (POS) tags such as noun, verb, adjective, and adverb.
nouns = ["cat", "dogs", "mice", "feet"]
verbs = ["playing", "running", "ate", "driven", "washes"]
print("Nouns:")
for noun in nouns:
lemma = lemmatizer.lemmatize(noun, pos='n')
print(noun, " --> ", lemma)
print("\nVerbs:")
for verb in verbs:
lemma = lemmatizer.lemmatize(verb, pos='v')
print(verb, " --> ", lemma)
In the above example, we lemmatize a list of nouns and verbs. For nouns, we pass the POS tag ‘n’ to the lemmatize
method, and for verbs, we pass the POS tag ‘v’. The output will show the original word and its lemmatized form.
Conclusion
Lemmatization is a useful technique for text normalization and analysis. In this blog post, we learned how to perform lemmatization using the NLTK library in Python. NLTK provides an easy-to-use WordNetLemmatizer
class to accomplish this task. By leveraging lemmatization, we can simplify text processing and improve the accuracy of our text analysis tasks.