[파이썬] requests-html 데이터 저장: CSV 포맷

06 Sep 2023

requests-html

In this blog post, we’ll explore how to save data obtained using the requests-html library in Python to a CSV format. requests-html is a powerful library that allows us to easily scrape and parse HTML pages.

Installation

Before we get started, make sure you have requests-html installed. You can install it using pip:

pip install requests-html

Importing necessary libraries

To begin with, let’s import the required libraries:

from requests_html import HTMLSession
import csv

We’ll be using the HTMLSession class from requests-html to create an object for making HTTP requests.

Making a request

Let’s start by making a request to a webpage using the HTMLSession object. We’ll then extract some data from the page.

session = HTMLSession()
response = session.get('http://example.com')

This code creates an HTMLSession object and makes a GET request to http://example.com. You can replace the URL with the webpage you want to scrape.

Scraping data

Now, let’s assume that we want to scrape some data from the webpage and save it in a CSV file. In this example, we’ll scrape a table that contains information about books.

# Find the table element
table = response.html.find('table')[0]

# Extract table headers
headers = [header.text for header in table.find('th')]

# Extract table rows
rows = []
for row in table.find('tr'):
    data = [cell.text for cell in row.find('td')]
    rows.append(data)

In this code snippet, we use the find method to locate the table element in the HTML response. Then, we iterate over the table rows, extracting the data from each cell and storing it in a list of lists called rows. We also extract the table headers and store them in the headers list.

Writing data to a CSV file

To write the scraped data to a CSV file, we’ll use the csv module in Python.

filename = 'books.csv'

with open(filename, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(headers)  # Write headers to CSV
    writer.writerows(rows)    # Write rows to CSV

Here, we open a new file called books.csv in write mode and create a csv.writer object. We then write the headers to the CSV file using writerow and write the rows to the CSV file using writerows.

Conclusion

In this blog post, we learned how to use the requests-html library to scrape data from a webpage and save it in a CSV format. We covered making a request, extracting data from an HTML table, and writing the data to a CSV file in Python.

Please note that web scraping should be done responsibly and with the permission of the website owner. Always refer to the website’s terms of service and be mindful of the impact your scraping may have on the website’s server.