[파이썬] requests-html 기본적인 HTML 요청

Python provides several libraries for making HTTP requests, and one of the popular choices is the requests library. However, if you need to work with HTML content, the requests-html library is a great choice. It is a combination of requests and beautifulsoup4 libraries, providing a convenient interface for making HTML requests and parsing the response.

In this blog post, we will explore the basic usage of requests-html library and see how to make HTML requests in Python.

Installation

To get started, you need to install the requests-html library. You can install it using pip:

pip install requests-html

Usage

First, let’s import the necessary modules and create an instance of the HTMLSession class:

from requests_html import HTMLSession

session = HTMLSession()

Now, let’s make a basic HTML request to a webpage and extract the HTML content:

response = session.get('https://example.com')

html_content = response.text

The get() method is used to make a GET request to the specified URL. The response object contains various attributes and methods to access the response data.

Next, let’s extract specific elements from the HTML content using CSS selectors:

# Extract all the links on the page
links = response.html.links

# Extract all the images on the page
images = response.html.images

# Extract specific elements using CSS selector
articles = response.html.find('article')

The links attribute returns a set of all the links on the page, while the images attribute returns a set of all the images. The find() method is used to find specific elements based on the CSS selector.

To further interact with the extracted elements, you can use various methods provided by the Element class:

# Get the text content of an element
text = articles[0].text

# Get the value of an attribute
href = links.pop().attrs['href']

# Click a link and follow the redirect
response = session.get(href)

The text attribute returns the text content of an element, while the attrs dictionary provides access to the attributes of an element. You can also use the click() method to simulate clicking a link and follow the redirect.

Conclusion

The requests-html library provides a convenient interface for making HTML requests and extracting information from the response. It combines the power of requests and beautifulsoup4 libraries, making it a great choice for web scraping and HTML parsing in Python.

In this blog post, we have covered the basic usage of requests-html library. By following these examples, you can easily make HTML requests and extract specific elements from the response in your Python projects.