[파이썬] requests-html 페이지의 인터랙션

06 Sep 2023

requests-html

The requests-html library is a powerful Python library that allows you to interact with web pages and extract information using the same techniques as a web browser. This library is built on top of requests and lxml, making it easy to navigate HTML documents, interact with JavaScript-heavy pages, and scrape data from websites.

In this blog post, we will explore the different ways you can interact with web pages using requests-html.

Installation

To get started, you’ll need to install requests-html. You can do this by running the following command:

pip install requests-html

Basic HTML scraping

To scrape HTML content from a webpage, you can use the HTMLSession class provided by requests-html. Here’s a simple example that fetches a webpage and extracts all the links from it:

from requests_html import HTMLSession

# Create a session
session = HTMLSession()

# Send a GET request to the webpage
response = session.get('https://example.com')

# Render the JavaScript on the page
response.html.render()

# Extract all the links from the page
links = response.html.links

# Print the links
for link in links:
    print(link)

In this example, we create an HTMLSession object and use it to send a GET request to example.com. Then, we render the JavaScript on the page using response.html.render(). Finally, we extract all the links from the HTML document using response.html.links.

Interacting with JavaScript-heavy pages

One of the great features of requests-html is the ability to interact with JavaScript-heavy pages. This is achieved by using a headless browser, pyppeteer, under the hood. It allows you to execute JavaScript scripts and interact with dynamic content on the page.

Here’s an example that demonstrates how to submit a form on a webpage:

from requests_html import HTMLSession

# Create a session
session = HTMLSession()

# Send a GET request to the webpage
response = session.get('https://example.com')

# Render the JavaScript on the page
response.html.render()

# Find the form on the page
form = response.html.find('form')[0]

# Fill in the form fields
form['username'] = 'myusername'
form['password'] = 'mypassword'

# Submit the form
response = form.submit()

# Print the response content
print(response.html.content)

In this example, we fetch a webpage using session.get(), render the JavaScript on the page, find the form element using response.html.find(), fill in the form fields by assigning values to them, and then submit the form using form.submit(). Finally, we print the response content.

Conclusion

With requests-html, you can easily interact with web pages and extract information from them, even if they contain JavaScript-heavy content. Whether you need to scrape data or automate form submission, this library provides a convenient and efficient way to accomplish these tasks in Python. Give it a try and explore its various features and capabilities in your own projects.