[파이썬] Beautiful Soup 4 태그의 텍스트 내용 가져오기

06 Sep 2023

beautiful soup

Beautiful Soup is a powerful Python library used for web scraping. It allows us to extract specific data from HTML and XML documents. One common use case is to extract the text content from specific tags using Beautiful Soup. In this blog post, we will explore how to accomplish this using Python.

Installing Beautiful Soup

Before we get started, we need to install the Beautiful Soup library. Open your terminal or command prompt and run the following command:

pip install beautifulsoup4

Importing Beautiful Soup

To start using Beautiful Soup in our Python code, we need to import it. We can do this by adding the following line at the top of our Python file:

from bs4 import BeautifulSoup

Loading the HTML content

Next, we need to load the HTML content into Beautiful Soup. We can do this by reading the HTML file or sending an HTTP request to a web page. Once we have the HTML content, we can create a Beautiful Soup object by passing the HTML content and the parser type.

# Read HTML file
with open('example.html') as file:
    html_content = file.read()

# Create Beautiful Soup object
soup = BeautifulSoup(html_content, 'html.parser')

Extracting the text content from tags

Now that we have the Beautiful Soup object, we can start extracting the text content from specific tags. There are several methods available to accomplish this:

Method 1: find() method

The find() method returns the first matching tag that satisfies the given conditions. We can use it to find a specific tag and extract its text content.

# Find the first <h1> tag and extract its text content
h1_tag = soup.find('h1')
h1_text = h1_tag.get_text()
print(h1_text)

Method 2: find_all() method

The find_all() method returns a list of all matching tags that satisfy the given conditions. We can use it to find multiple tags and extract their text content.

# Find all <p> tags and extract their text content
p_tags = soup.find_all('p')
for p_tag in p_tags:
    p_text = p_tag.get_text()
    print(p_text)

Conclusion

Beautiful Soup provides a convenient way to extract text content from HTML and XML documents. By using the find() and find_all() methods, we can easily locate specific tags and extract their text content. This can be extremely useful for web scraping, data extraction, and other data processing tasks. Get started with Beautiful Soup today and harness the power of web scraping!