The requests-html
library is a powerful Python library that allows you to interact with web pages and extract information using the same techniques as a web browser. This library is built on top of requests
and lxml
, making it easy to navigate HTML documents, interact with JavaScript-heavy pages, and scrape data from websites.
In this blog post, we will explore the different ways you can interact with web pages using requests-html
.
Installation
To get started, you’ll need to install requests-html
. You can do this by running the following command:
pip install requests-html
Basic HTML scraping
To scrape HTML content from a webpage, you can use the HTMLSession
class provided by requests-html
. Here’s a simple example that fetches a webpage and extracts all the links from it:
from requests_html import HTMLSession
# Create a session
session = HTMLSession()
# Send a GET request to the webpage
response = session.get('https://example.com')
# Render the JavaScript on the page
response.html.render()
# Extract all the links from the page
links = response.html.links
# Print the links
for link in links:
print(link)
In this example, we create an HTMLSession
object and use it to send a GET request to example.com
. Then, we render the JavaScript on the page using response.html.render()
. Finally, we extract all the links from the HTML document using response.html.links
.
Interacting with JavaScript-heavy pages
One of the great features of requests-html
is the ability to interact with JavaScript-heavy pages. This is achieved by using a headless browser, pyppeteer
, under the hood. It allows you to execute JavaScript scripts and interact with dynamic content on the page.
Here’s an example that demonstrates how to submit a form on a webpage:
from requests_html import HTMLSession
# Create a session
session = HTMLSession()
# Send a GET request to the webpage
response = session.get('https://example.com')
# Render the JavaScript on the page
response.html.render()
# Find the form on the page
form = response.html.find('form')[0]
# Fill in the form fields
form['username'] = 'myusername'
form['password'] = 'mypassword'
# Submit the form
response = form.submit()
# Print the response content
print(response.html.content)
In this example, we fetch a webpage using session.get()
, render the JavaScript on the page, find the form element using response.html.find()
, fill in the form fields by assigning values to them, and then submit the form using form.submit()
. Finally, we print the response content.
Conclusion
With requests-html
, you can easily interact with web pages and extract information from them, even if they contain JavaScript-heavy content. Whether you need to scrape data or automate form submission, this library provides a convenient and efficient way to accomplish these tasks in Python. Give it a try and explore its various features and capabilities in your own projects.