[파이썬] requests-html `render()` 메서드

06 Sep 2023

requests-html

In the world of web scraping, it’s common to encounter websites that heavily rely on JavaScript to dynamically load content. Traditionally, static web scraping libraries like requests and BeautifulSoup struggle to handle such websites. However, with the advent of the requests-html library, dynamically loading web pages has become much easier.

One of the key features of requests-html is its render() method, which uses a headless browser under the hood to execute JavaScript and render the web page in its final state. This allows you to scrape content from websites that heavily manipulate the DOM using JavaScript.

Installation

Before we dive into using the render() method, let’s ensure we have requests-html installed. Run the following command:

pip install requests-html

Basic Usage

Using the render() method is straightforward. Here’s an example of how to scrape dynamically loaded content from a web page:

from requests_html import HTMLSession

url = 'https://www.example.com'
session = HTMLSession()
response = session.get(url)
response.html.render()

# Now you can access the rendered HTML
html = response.html.html

In the above code, we first create an HTMLSession object and make a GET request to the desired URL. Then, we call the render() method on the response.html object. This triggers the JavaScript execution, ensuring that the web page is fully loaded. Finally, we can access the rendered HTML by accessing the response.html.html attribute.

Waiting for Dynamic Content

In some cases, the dynamic content on a web page might take a while to load. To ensure that you scrape the fully loaded page, you can use the render() method’s wait parameter. This allows you to specify a time to wait in seconds before extracting the rendered HTML.

from requests_html import HTMLSession

url = 'https://www.example.com'
session = HTMLSession()
response = session.get(url)
response.html.render(wait=5)  # Wait for 5 seconds

html = response.html.html

By setting wait=5, the render() method waits for 5 seconds before extracting the rendered HTML. Adjust the wait time according to the specific needs of the website you are scraping.

Additional Features

Apart from rendering and waiting for dynamic content, requests-html provides several other useful features:

CSS Selectors: You can use CSS selectors to easily extract elements from the rendered HTML. Check out the requests-html documentation for more information on how to utilize CSS selectors.
JavaScript Evaluation: If you need to execute custom JavaScript code on the rendered page, you can use the response.html.render(script='your_script.js') method. This allows you to interact with the rendered page programmatically.

With these features, requests-html makes scraping dynamically loaded web pages a breeze. It helps you overcome the limitations of traditional scraping libraries and enables you to gather valuable data from JavaScript-based websites.

Give requests-html a try in your next web scraping project, and experience the power of rendering dynamic web pages!