[파이썬] Beautiful Soup 4 `NavigableString` 객체 사용하기

Beautiful Soup is a popular Python library used for web scraping. It simplifies the process of extracting data from HTML and XML documents. In this blog post, we will explore the NavigableString object provided by Beautiful Soup 4.

What is a NavigableString?

In Beautiful Soup, a NavigableString represents the textual content within a tag. It is a subclass of Python’s built-in unicode or str (depending on your Python version) class, and provides additional functionality for manipulating and extracting text from HTML or XML documents.

Accessing NavigableString Objects

To access a NavigableString object, we first need to parse the HTML or XML document using Beautiful Soup. Here’s an example of how to do that:

from bs4 import BeautifulSoup

# Assume we have an HTML document stored in a variable called html_doc
soup = BeautifulSoup(html_doc, 'html.parser')

# Access the first tag that contains some text
tag_with_text = soup.find('p')

# Access the NavigableString object within the tag
text_content = tag_with_text.string

print(text_content)

In this example, we use the find method to locate the first p tag in the document. We then access the string attribute of the tag, which returns the NavigableString object representing the text content of the tag.

Manipulating NavigableString Objects

Once we have access to a NavigableString object, we can perform various operations on it. Some common operations include:

Conclusion

In this blog post, we explored the NavigableString object provided by Beautiful Soup 4. We learned how to access and manipulate the text content within HTML or XML tags using this object. The NavigableString object is a powerful tool for working with textual data extracted from web documents.