[파이썬] requests-html CSS 선택자 사용하기

06 Sep 2023

requests-html

requests-html

The requests-html library in Python is a powerful tool for web scraping and HTML parsing. One of its key features is the ability to select specific elements from a webpage using CSS selectors.

CSS selectors are a smart and efficient way to navigate and extract information from HTML documents. In this blog post, we will explore how to use CSS selectors with requests-html to easily extract data from webpages.

Installation

To get started, you need to install the requests-html library. You can do this by running the following command:

pip install requests-html

Basic Usage

Let’s start by importing the necessary modules:

from requests_html import HTMLSession

Next, we will create an HTML session object:

session = HTMLSession()

Now, we can use the get method of the session object to retrieve the HTML content of a webpage:

response = session.get('https://example.com')

To extract specific elements from the webpage using CSS selectors, we can use the find or findall methods provided by requests-html.

Searching for Elements

The find method returns the first element that matches the given CSS selector:

element = response.html.find('h1', first=True)
print(element.text)

The findall method returns a list of all elements that match the given CSS selector:

elements = response.html.findall('p')
for element in elements:
    print(element.text)

In both cases, we use the text property to retrieve the inner text of the selected elements.

CSS Selector Examples

Here are some common CSS selectors and their usage with requests-html.

Element Selector: selects all elements of a specific type.

# Select all `a` elements
links = response.html.findall('a')

Class Selector: selects elements with a specific class.

# Select all elements with the `btn` class
buttons = response.html.findall('.btn')

ID Selector: selects an element with a specific ID.

# Select the element with the `banner` ID
banner = response.html.find('#banner', first=True)

Attribute Selector: selects elements with a specific attribute value.

# Select all elements with the `data-id` attribute
elements = response.html.findall('[data-id]')

These are just a few examples of CSS selectors. You can find more information and advanced usage in the requests-html documentation.

Conclusion

Using CSS selectors with requests-html in Python makes web scraping and HTML parsing more efficient and straightforward. With just a few lines of code, you can easily extract specific elements from webpages based on their CSS selectors. This allows you to gather the data you need for your projects or analyses.

Give it a try and explore the power of CSS selectors in requests-html!