[파이썬] requests-html 페이지 내 검색 기능 사용

06 Sep 2023

requests-html

In this blog post, we will explore how to use the requests-html library in Python to search for specific content within a web page. The requests-html library is a powerful tool that allows us to make HTTP requests and parse HTML content easily.

Installing requests-html

First, we need to install the requests-html library. Open your terminal or command prompt and run the following command:

pip install requests-html

Making a HTTP request and parsing HTML

To start, let’s import necessary modules and create an instance of the HTMLSession class from the requests_html module:

from requests_html import HTMLSession

session = HTMLSession()

We can now make a request to a web page and retrieve its HTML content. For example, let’s search for the latest news on Python from the Python.org website:

response = session.get('https://www.python.org/')
html_content = response.text

Searching for specific content

Now that we have the HTML content, we can use the html.find() method to search for specific elements within the page. This method takes a CSS selector as an argument and returns a list of matching elements.

For example, let’s say we want to find all the links on the Python.org homepage:

links = html_content.find('a')
for link in links:
    print(link.text)

We can also search for content based on attributes. Let’s say we want to find all the images with the src attribute on the page:

images = html_content.find('img[src]')
for image in images:
    print(image.attrs['src'])

Advanced searching

The find() method provides various options to refine the search. Here are a few examples:

Select by class: html_content.find('.classname')
Select by id: html_content.find('#idname')
Select by attribute value: html_content.find('[attribute=value]')

For more information on advanced searching options, refer to the Beautiful Soup documentation.

Conclusion

In this blog post, we have seen how to use the requests-html library to make HTTP requests and search for specific content within a web page. The combination of requests and requests-html provides a powerful and convenient way to scrape and extract information from websites.