[파이썬] Beautiful Soup 4 `next_sibling` 및 `previous_sibling` 사용하기

When web scraping with Beautiful Soup 4, you often come across situations where you need to navigate through the HTML structure. The next_sibling and previous_sibling methods in Beautiful Soup 4 allow you to access the next and previous siblings, respectively, of a given HTML element.

Understanding Siblings

In an HTML document, siblings are elements that share the same parent. For example, consider the following HTML code:

<div>
    <p>Paragraph 1</p>
    <p>Paragraph 2</p>
    <p>Paragraph 3</p>
</div>

In this case, the three <p> elements are siblings because they all reside inside the same parent <div> element.

Using next_sibling

The next_sibling method in Beautiful Soup 4 allows you to access the next sibling of an HTML element. To use it, you first need to select an element using one of Beautiful Soup’s selector methods (e.g., find or find_all). Let’s see an example:

from bs4 import BeautifulSoup

html = '''
<div>
    <p>Paragraph 1</p>
    <p>Paragraph 2</p>
    <p>Paragraph 3</p>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
paragraph_1 = soup.find('p', string='Paragraph 1')

next_sibling = paragraph_1.next_sibling
print(next_sibling)

In this example, we select the first <p> element with the text “Paragraph 1”. The next_sibling method returns the next sibling, which, in this case, is the second <p> element with the text “Paragraph 2”. The output will be:

<p>Paragraph 2</p>

Using previous_sibling

Similarly, the previous_sibling method allows you to access the previous sibling of an HTML element. Let’s modify the previous example to demonstrate this:

from bs4 import BeautifulSoup

html = '''
<div>
    <p>Paragraph 1</p>
    <p>Paragraph 2</p>
    <p>Paragraph 3</p>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
paragraph_3 = soup.find('p', string='Paragraph 3')

previous_sibling = paragraph_3.previous_sibling
print(previous_sibling)

In this case, we select the last <p> element with the text “Paragraph 3”. The previous_sibling method returns the previous sibling, which, in this case, is the second <p> element with the text “Paragraph 2”. The output will be:

<p>Paragraph 2</p>

Conclusion

The next_sibling and previous_sibling methods in Beautiful Soup 4 provide a convenient way to navigate through the HTML structure and access the adjacent siblings of a given HTML element. By understanding these methods, you can enhance your web scraping capabilities and extract the desired data from complex HTML documents.