In this blog post, we’ll explore how to save data obtained using the requests-html
library in Python to a CSV format. requests-html
is a powerful library that allows us to easily scrape and parse HTML pages.
Installation
Before we get started, make sure you have requests-html
installed. You can install it using pip
:
pip install requests-html
Importing necessary libraries
To begin with, let’s import the required libraries:
from requests_html import HTMLSession
import csv
We’ll be using the HTMLSession
class from requests-html
to create an object for making HTTP requests.
Making a request
Let’s start by making a request to a webpage using the HTMLSession
object. We’ll then extract some data from the page.
session = HTMLSession()
response = session.get('http://example.com')
This code creates an HTMLSession
object and makes a GET
request to http://example.com
. You can replace the URL with the webpage you want to scrape.
Scraping data
Now, let’s assume that we want to scrape some data from the webpage and save it in a CSV file. In this example, we’ll scrape a table that contains information about books.
# Find the table element
table = response.html.find('table')[0]
# Extract table headers
headers = [header.text for header in table.find('th')]
# Extract table rows
rows = []
for row in table.find('tr'):
data = [cell.text for cell in row.find('td')]
rows.append(data)
In this code snippet, we use the find
method to locate the table element in the HTML response. Then, we iterate over the table rows, extracting the data from each cell and storing it in a list of lists called rows
. We also extract the table headers and store them in the headers
list.
Writing data to a CSV file
To write the scraped data to a CSV file, we’ll use the csv
module in Python.
filename = 'books.csv'
with open(filename, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(headers) # Write headers to CSV
writer.writerows(rows) # Write rows to CSV
Here, we open a new file called books.csv
in write mode and create a csv.writer
object. We then write the headers to the CSV file using writerow
and write the rows to the CSV file using writerows
.
Conclusion
In this blog post, we learned how to use the requests-html
library to scrape data from a webpage and save it in a CSV format. We covered making a request, extracting data from an HTML table, and writing the data to a CSV file in Python.
Please note that web scraping should be done responsibly and with the permission of the website owner. Always refer to the website’s terms of service and be mindful of the impact your scraping may have on the website’s server.