[파이썬] Beautiful Soup 4 결과를 파일로 저장하기

Beautiful Soup 4 is a powerful Python library for web scraping and parsing HTML and XML documents. It provides convenient ways to extract and manipulate data from web pages. In this tutorial, we will learn how to save the results obtained using Beautiful Soup 4 to a file.

Installing Beautiful Soup 4

Before we begin, make sure you have Beautiful Soup 4 installed. If not, you can install it by running the following command:

pip install beautifulsoup4

Importing the necessary libraries

To get started, we need to import the necessary libraries. In addition to Beautiful Soup 4, we also need the requests library to fetch the web page content. Run the following code to import the libraries:

from bs4 import BeautifulSoup
import requests

Fetching and parsing the web page

Next, we need to fetch the web page content and parse it using Beautiful Soup 4. This library provides different parsers, such as html.parser, lxml, and html5lib. For this example, let’s use the html.parser parser. Run the following code to fetch and parse the web page:

url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

Make sure to replace https://example.com with the actual URL you want to scrape.

Extracting data using Beautiful Soup 4

After parsing the web page, we can use Beautiful Soup 4 to extract the desired data. The library provides various methods to navigate and search the parsed content, such as finding elements by tag name, class, or ID. For example, to extract all the links from the web page, we can use the find_all method:

links = soup.find_all("a")

Saving the results to a file

To save the extracted data to a file, we first need to open a file in write mode. Run the following code to open a file named output.txt and write the extracted data:

with open("output.txt", "w") as file:
    for link in links:
        file.write(link.get("href") + "\n")

This code will save each link’s href attribute to a new line in the file.

Conclusion

In this tutorial, we learned how to save the results obtained using Beautiful Soup 4 to a file. We covered the installation, fetching and parsing a web page, extracting data using Beautiful Soup 4, and saving the extracted data to a file. Beautiful Soup 4 provides a convenient way to scrape data from web pages and perform various transformations on the parsed content. You can explore its documentation to discover more advanced features and use cases. Happy scraping!