[파이썬] requests JavaScript 처리된 웹페이지 요청

07 Sep 2023

requests

In web scraping, sometimes we encounter web pages where the content is dynamically generated using JavaScript. Standard library urllib or urllib2 in Python may not be sufficient to handle such cases. That’s when requests, a popular third-party library, comes to the rescue.

Using requests, we can send HTTP requests to a website and retrieve the response. In this blog post, we will explore how to handle JavaScript-rendered web pages using requests in Python.

Installation and Setup

Before we get started, make sure you have requests library installed. You can install it by running the following command:

pip install requests

Once installed, we can import the requests library in our Python script:

import requests

Making a Basic Request

To send a basic request to a website and retrieve the response, we can use the requests.get() method. Let’s see an example:

import requests

url = "https://example.com"
response = requests.get(url)

print(response.text)

The requests.get() method takes the URL of the website as an argument and returns a Response object. We can access the content of the response using the text attribute.

Handling JavaScript Rendered Pages

When dealing with JavaScript-rendered web pages, the basic requests.get() method alone might not give us the desired result. In such cases, we can use additional libraries like Selenium or MechanicalSoup. These libraries simulate a web browser and execute the JavaScript code, allowing us to retrieve the fully rendered page.

Here’s an example using Selenium:

from selenium import webdriver

url = "https://example.com"

# Set up Selenium webdriver
driver = webdriver.Chrome()
driver.get(url)

# Retrieve the rendered page content
html_content = driver.page_source

# Close the webdriver
driver.quit()

print(html_content)

In this example, we initialize the webdriver and open the URL using driver.get(). We then retrieve the rendered page content using driver.page_source. Finally, we close the webdriver using driver.quit().

Conclusion

Thanks to the requests library, we can easily send HTTP requests to websites and retrieve the HTML response. However, when dealing with JavaScript-rendered web pages, additional libraries like Selenium might be required to retrieve the fully rendered content.

By combining requests with other libraries, we can overcome the challenges of scraping JavaScript-rendered web pages and extract the data we need for our projects. Happy scraping!

Note: Remember to check the terms of service and respect the website’s policies when scraping data from websites.