In this blog post, we’ll explore how to save data obtained using the requests-html library in Python to a CSV format. requests-html is a powerful library that allows us to easily scrape and parse HTML pages.
Installation
Before we get started, make sure you have requests-html installed. You can install it using pip:
pip install requests-html
Importing necessary libraries
To begin with, let’s import the required libraries:
from requests_html import HTMLSession
import csv
We’ll be using the HTMLSession class from requests-html to create an object for making HTTP requests.
Making a request
Let’s start by making a request to a webpage using the HTMLSession object. We’ll then extract some data from the page.
session = HTMLSession()
response = session.get('http://example.com')
This code creates an HTMLSession object and makes a GET request to http://example.com. You can replace the URL with the webpage you want to scrape.
Scraping data
Now, let’s assume that we want to scrape some data from the webpage and save it in a CSV file. In this example, we’ll scrape a table that contains information about books.
# Find the table element
table = response.html.find('table')[0]
# Extract table headers
headers = [header.text for header in table.find('th')]
# Extract table rows
rows = []
for row in table.find('tr'):
data = [cell.text for cell in row.find('td')]
rows.append(data)
In this code snippet, we use the find method to locate the table element in the HTML response. Then, we iterate over the table rows, extracting the data from each cell and storing it in a list of lists called rows. We also extract the table headers and store them in the headers list.
Writing data to a CSV file
To write the scraped data to a CSV file, we’ll use the csv module in Python.
filename = 'books.csv'
with open(filename, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(headers) # Write headers to CSV
writer.writerows(rows) # Write rows to CSV
Here, we open a new file called books.csv in write mode and create a csv.writer object. We then write the headers to the CSV file using writerow and write the rows to the CSV file using writerows.
Conclusion
In this blog post, we learned how to use the requests-html library to scrape data from a webpage and save it in a CSV format. We covered making a request, extracting data from an HTML table, and writing the data to a CSV file in Python.
Please note that web scraping should be done responsibly and with the permission of the website owner. Always refer to the website’s terms of service and be mindful of the impact your scraping may have on the website’s server.