[파이썬] requests-html 이미지 URL 추출하기

06 Sep 2023

requests-html

When building a web scraping or data extraction application, one common task is extracting image URLs from web pages. In this blog post, we’ll explore how to achieve this using the requests-html library in Python.

Installing requests-html

Before we start, let’s make sure we have the requests-html library installed. You can install it using pip:

pip install requests-html

Now that we have requests-html installed, let’s dive into how to extract image URLs from web pages.

Importing necessary modules

First, let’s import the necessary modules from requests_html:

from requests_html import HTMLSession

Creating a session

Once we have imported the required modules, we need to create an instance of the HTMLSession class. This session object will allow us to send HTTP requests and parse the HTML content of the web page:

session = HTMLSession()

Sending a request and parsing the content

To extract image URLs, we need to send a request to the web page and parse its content using the HTMLSession.get() method. Here’s an example:

response = session.get("https://example.com")

Extracting image URLs

Now, let’s extract the image URLs from the parsed HTML content. We can use the .find() method from the requests-html library to select all image elements on the page. Here’s an example:

image_elements = response.html.find("img")

Iterating over image elements and extracting URLs

Next, we can iterate over the image elements and extract the URLs using the .attrs['src'] attribute. Here’s an example:

image_urls = []
for element in image_elements:
    image_urls.append(element.attrs['src'])

Printing the extracted URLs

Finally, let’s print the extracted image URLs:

for url in image_urls:
    print(url)

Full code example

Here’s the full code example that puts it all together:

from requests_html import HTMLSession

session = HTMLSession()
response = session.get("https://example.com")

image_elements = response.html.find("img")

image_urls = []
for element in image_elements:
    image_urls.append(element.attrs['src'])

for url in image_urls:
    print(url)

And there you have it! You now know how to extract image URLs from web pages using the requests-html library in Python.

Happy extracting!