[파이썬] requests-html 링크 추출하기

06 Sep 2023

requests-html

In this blog post, we will explore how to extract links from web pages using the requests-html library in Python. The requests-html library is a Python library that makes it easy to scrape and interact with HTML web pages.

Installing requests-html

Before we get started, let’s make sure we have the requests-html library installed. We can install it using pip by running the following command:

pip install requests-html

Extracting Links

To extract links from a web page, we will use the requests-html library’s HTMLSession and Link classes. Let’s take a look at an example:

from requests_html import HTMLSession

# Create an instance of HTMLSession
session = HTMLSession()

# Specify the URL of the web page to extract links from
url = "https://example.com"

# Send a GET request to the web page
response = session.get(url)

# Render the JavaScript on the page
response.html.render()

# Extract all links on the page
links = response.html.absolute_links

# Print the extracted links
for link in links:
    print(link)

In the above example, we first create an instance of HTMLSession. We then specify the URL of the web page we want to extract links from and send a GET request to that URL using the get method.

After receiving the response, we render the JavaScript on the page using the render method. This is important because some websites may use JavaScript to dynamically generate content.

Finally, we use the absolute_links property of the response.html object to extract all links on the page. We then iterate over the links and print each one.

Conclusion

In this blog post, we have learned how to extract links from web pages using the requests-html library in Python. This can be useful when we want to scrape or analyze web pages for specific information or perform web scraping tasks. The requests-html library provides a simple and efficient way to interact with HTML web pages and retrieve useful data.