[파이썬] Beautiful Soup 4 `NavigableString` 객체 사용하기

06 Sep 2023

beautiful soup

Beautiful Soup is a popular Python library used for web scraping. It simplifies the process of extracting data from HTML and XML documents. In this blog post, we will explore the NavigableString object provided by Beautiful Soup 4.

What is a NavigableString?

In Beautiful Soup, a NavigableString represents the textual content within a tag. It is a subclass of Python’s built-in unicode or str (depending on your Python version) class, and provides additional functionality for manipulating and extracting text from HTML or XML documents.

Accessing NavigableString Objects

To access a NavigableString object, we first need to parse the HTML or XML document using Beautiful Soup. Here’s an example of how to do that:

from bs4 import BeautifulSoup

# Assume we have an HTML document stored in a variable called html_doc
soup = BeautifulSoup(html_doc, 'html.parser')

# Access the first tag that contains some text
tag_with_text = soup.find('p')

# Access the NavigableString object within the tag
text_content = tag_with_text.string

print(text_content)

In this example, we use the find method to locate the first p tag in the document. We then access the string attribute of the tag, which returns the NavigableString object representing the text content of the tag.

Manipulating NavigableString Objects

Once we have access to a NavigableString object, we can perform various operations on it. Some common operations include:

Getting the text content: We can retrieve the textual content of a NavigableString object using the string attribute. For example: text = navigable_string.string
Modifying the text content: We can replace or modify the text content of a NavigableString object using the replace_with method. For example: navigable_string.replace_with('New text')
Checking if a string is present: We can check if a certain string exists within a NavigableString object using the in operator. For example: is_present = 'some string' in navigable_string
Getting the length of the string: We can find the length of the text content of a NavigableString object using the len function. For example: length = len(navigable_string)

Conclusion

In this blog post, we explored the NavigableString object provided by Beautiful Soup 4. We learned how to access and manipulate the text content within HTML or XML tags using this object. The NavigableString object is a powerful tool for working with textual data extracted from web documents.