[파이썬] Beautiful Soup 4 태그의 부모와 자식 관계 탐색

06 Sep 2023

beautiful soup

Beautiful Soup is a popular Python library for web scraping. It provides a convenient way to parse HTML and XML documents by converting them into a structured data format. One useful feature of Beautiful Soup is the ability to navigate and explore the parent-child relationships between tags.

In this blog post, we will explore how to use Beautiful Soup 4 to navigate the parent-child relationships of tags.

Installation

To use Beautiful Soup 4, you need to install it first. You can do this using pip, the Python package manager. Open your terminal and run the following command:

pip install beautifulsoup4

HTML structure

Let’s start by examining a sample HTML structure that we will be working with:

<!DOCTYPE html>
<html>
<head>
  <title>Example Page</title>
</head>
<body>
  <div id="content">
    <h1>Welcome to my Page</h1>
    <p>This is a sample paragraph.</p>
    <ul>
      <li>First item</li>
      <li>Second item</li>
    </ul>
  </div>
</body>
</html>

Loading the HTML

To begin, we need to import Beautiful Soup and load our HTML document. Here is an example of how to do that:

from bs4 import BeautifulSoup

# Load the HTML document
html = """
<!DOCTYPE html>
<html>
<head>
  <title>Example Page</title>
</head>
<body>
  <div id="content">
    <h1>Welcome to my Page</h1>
    <p>This is a sample paragraph.</p>
    <ul>
      <li>First item</li>
      <li>Second item</li>
    </ul>
  </div>
</body>
</html>
"""

# Create a Beautiful Soup object
soup = BeautifulSoup(html, 'html.parser')

Navigating the parent-child relationships

Once we have our HTML loaded into Beautiful Soup, we can start navigating the parent-child relationships between tags.

Accessing the parent tag

To access the parent tag of a specific tag, we can use the parent attribute. Here’s an example:

# Find the <h1> tag
heading = soup.find('h1')

# Access the parent tag
parent_div = heading.parent

In this example, we first find the <h1> tag using the find() method. Then, we access its parent tag using the parent attribute. This will give us the <div> tag with the id of “content”.

Accessing the child tags

To access the child tags of a specific tag, we can use the children attribute. Here’s an example:

# Find the <div> tag with id "content"
content_div = soup.find('div', {'id': 'content'})

# Access the child tags
children = content_div.children

In this example, we first find the <div> tag with the id of “content”. Then, we access its child tags using the children attribute. This will give us a list-like object containing the child tags of the <div> tag.

Conclusion

Beautiful Soup 4 provides a convenient way to navigate and explore the parent-child relationships between tags in HTML documents. By using the parent and children attributes, you can easily access the parent and child tags of a specific tag. This can be incredibly useful when web scraping or traversing complex HTML structures.

Remember to install Beautiful Soup 4 using pip before using it in your projects. Happy coding!