[파이썬] Beautiful Soup 4 `next_sibling` 및 `previous

[파이썬] Beautiful Soup 4 `next_sibling` 및 `previous_sibling` 사용하기

06 Sep 2023

beautiful soup

When web scraping with Beautiful Soup 4, you often come across situations where you need to navigate through the HTML structure. The next_sibling and previous_sibling methods in Beautiful Soup 4 allow you to access the next and previous siblings, respectively, of a given HTML element.

Understanding Siblings

In an HTML document, siblings are elements that share the same parent. For example, consider the following HTML code:

<div>
    <p>Paragraph 1</p>
    <p>Paragraph 2</p>
    <p>Paragraph 3</p>
</div>

In this case, the three  elements are siblings because they all reside inside the same parent <div> element.

Using `next_sibling`

The next_sibling method in Beautiful Soup 4 allows you to access the next sibling of an HTML element. To use it, you first need to select an element using one of Beautiful Soup’s selector methods (e.g., find or find_all). Let’s see an example:

from bs4 import BeautifulSoup

html = '''
<div>
    <p>Paragraph 1</p>
    <p>Paragraph 2</p>
    <p>Paragraph 3</p>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
paragraph_1 = soup.find('p', string='Paragraph 1')

next_sibling = paragraph_1.next_sibling
print(next_sibling)

In this example, we select the first  element with the text “Paragraph 1”. The next_sibling method returns the next sibling, which, in this case, is the second  element with the text “Paragraph 2”. The output will be:

<p>Paragraph 2</p>

Using `previous_sibling`

Similarly, the previous_sibling method allows you to access the previous sibling of an HTML element. Let’s modify the previous example to demonstrate this:

from bs4 import BeautifulSoup

html = '''
<div>
    <p>Paragraph 1</p>
    <p>Paragraph 2</p>
    <p>Paragraph 3</p>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
paragraph_3 = soup.find('p', string='Paragraph 3')

previous_sibling = paragraph_3.previous_sibling
print(previous_sibling)

In this case, we select the last  element with the text “Paragraph 3”. The previous_sibling method returns the previous sibling, which, in this case, is the second  element with the text “Paragraph 2”. The output will be:

<p>Paragraph 2</p>

Conclusion

The next_sibling and previous_sibling methods in Beautiful Soup 4 provide a convenient way to navigate through the HTML structure and access the adjacent siblings of a given HTML element. By understanding these methods, you can enhance your web scraping capabilities and extract the desired data from complex HTML documents.

Understanding Siblings

Using next_sibling

Using previous_sibling

Conclusion

Using `next_sibling`

Using `previous_sibling`