[파이썬] nltk Concordance(일치) 검색

One powerful feature provided by the Natural Language Toolkit (NLTK) library in Python is the ability to perform concordance searches. A concordance search allows us to find all occurrences of a specific word or phrase within a given text. This can be useful for various tasks such as analyzing language patterns, studying word usage, or extracting meaningful insights from textual data.

To perform a concordance search using NLTK, we can make use of the nltk.Text class and its concordance() method. Here’s how it works:

import nltk

# Load the text corpus
corpus = nltk.corpus.gutenberg.words('shakespeare-hamlet.txt')

# Create an NLTK Text object from the corpus
text = nltk.Text(corpus)

# Perform a concordance search
text.concordance('hamlet')

In the code above, we first import the nltk library and load the desired text corpus. In this case, we are using the Shakespeare’s Hamlet play as our example.

Next, we create an nltk.Text object by passing in the loaded corpus. The nltk.Text class provides several methods for text analysis and manipulation.

Finally, we call the concordance() method on the text object, providing the word or phrase we want to search for as an argument. In this example, we are searching for the word ‘hamlet’.

The concordance() method will print out all occurrences of the given word or phrase, along with some context around each occurrence. This context helps us understand how the word is used in the text.

Displaying 25 of 99 matches:
  Hamlet ACT I . Scene I . Elsinore . A platform
 my lord ; I came to see your father ' s funeral
d he , his shoes were old , with blisters on his
e take him in the purging of his soul , when he i
mself ? Horatio . Such was the very armour he had
he tedious minutes I will walk ; by slow degrees
dead , soundly . But wherefore art not in thy tr
in his hands , how came he dead ? I ' ll not be l
er you come most carefully upon your hour . Barn
t was about to speak , when the cock crew . Hor
it boded some strange eruption to our state . Ma
ks . Oh , my lord , if my duty be too bold , my
 dignity of choice , and so are they . Or I Ope
no charge of falsehood can be made but such as he
my most painted word : O heavy burthen ! Pol
commands his son , from me reflects on himsel
and spur my dull revenge ! What is a man , If hi
face . And what make you from Wittenberg , Hora
weed in the cap of youth , yet he beware of it ;
all ;
 to his revenge , when he is fit and season ' d
ance . Marcellus . Good now , sit down , and tel
t nature is ashamed Almost to acknowledge hersel
ord , my lord , the roa
uggs , the shoeing , the rug , and such decorous

As shown in the example output above, the concordance() method displays up to 25 matches by default, along with the contextual words before and after each match. We can tweak the number of matches displayed by passing a lines argument to the concordance() method.

In conclusion, nltk’s concordance search feature is a valuable tool for analyzing and extracting information from textual data. By leveraging this functionality, we can gain deeper insights into word usage, language patterns, and other linguistic aspects of a given text corpus.