[파이썬] Beautiful Soup 4 CSV 형식으로 내보내기

06 Sep 2023

beautiful soup

Beautiful Soup 4 is a Python library designed to make web scraping easy and efficient. It provides methods to parse HTML and XML documents, extract data from them, and navigate their structure using the Python programming language.

In this blog post, we will focus on using Beautiful Soup 4 to scrape data from a webpage and export it in CSV format. CSV (Comma-Separated Values) is a widely used format for storing tabular data, and it can be easily imported into spreadsheet applications like Microsoft Excel or Google Sheets.

Installation

Before we begin, make sure you have Beautiful Soup 4 installed on your system. You can install it using pip by running the following command in your terminal or command prompt:

pip install beautifulsoup4

Scraping a Webpage

First, we need to import the necessary libraries:

from bs4 import BeautifulSoup
import requests
import csv

Next, we need to fetch the webpage HTML using the requests library:

url = "https://example.com"
response = requests.get(url)
html_content = response.content

Now that we have the HTML content, we can create a Beautiful Soup object and start parsing the HTML:

soup = BeautifulSoup(html_content, 'html.parser')

To extract the data from the webpage, we need to understand its structure and locate the elements we are interested in. Beautiful Soup provides various methods to search, navigate, and extract data from HTML or XML documents.

Once you have located the desired data, you can store it in a list or a dictionary. For example, let’s say we want to scrape a webpage containing a table of products and their prices:

products = []

# Find all table rows
rows = soup.find_all('tr')

# Skip the header row
for row in rows[1:]:
    # Extract product name and price
    name = row.find('td', class_='product-name').text.strip()
    price = row.find('td', class_='product-price').text.strip()
    
    # Store product in a dictionary
    product = {'name': name, 'price': price}
    products.append(product)

Exporting to CSV

Now that we have our data stored in a list (or a dictionary), we can export it to a CSV file.

filename = 'products.csv'

# Open the CSV file in write mode
with open(filename, 'w', newline='') as csvfile:
    fieldnames = ['name', 'price']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
    # Write the header row
    writer.writeheader()
    
    # Write the data rows
    for product in products:
        writer.writerow(product)

The csv.DictWriter class provided by the csv module allows us to easily write a list of dictionaries to a CSV file. We specify the fieldnames (column headers) using the fieldnames parameter, write the header row using writer.writeheader(), and then loop through our data and write each row using writer.writerow(product).

That’s it! You have successfully scraped a webpage using Beautiful Soup 4 and exported the data to CSV format. By following this approach, you can automate the process of collecting data from various websites and quickly analyze it using spreadsheet applications.

Remember to handle any potential exceptions and errors that may occur during the scraping process, and be mindful of any website’s terms of service when scraping their data.

I hope this blog post has been helpful to get you started with web scraping and exporting data using Beautiful Soup 4 in Python. Happy scraping!