When building a web scraping or data extraction application, one common task is extracting image URLs from web pages. In this blog post, we’ll explore how to achieve this using the requests-html library in Python.
Installing requests-html
Before we start, let’s make sure we have the requests-html library installed. You can install it using pip:
pip install requests-html
Now that we have requests-html installed, let’s dive into how to extract image URLs from web pages.
Importing necessary modules
First, let’s import the necessary modules from requests_html:
from requests_html import HTMLSession
Creating a session
Once we have imported the required modules, we need to create an instance of the HTMLSession
class. This session object will allow us to send HTTP requests and parse the HTML content of the web page:
session = HTMLSession()
Sending a request and parsing the content
To extract image URLs, we need to send a request to the web page and parse its content using the HTMLSession.get()
method. Here’s an example:
response = session.get("https://example.com")
Extracting image URLs
Now, let’s extract the image URLs from the parsed HTML content. We can use the .find()
method from the requests-html library to select all image elements on the page. Here’s an example:
image_elements = response.html.find("img")
Iterating over image elements and extracting URLs
Next, we can iterate over the image elements and extract the URLs using the .attrs['src']
attribute. Here’s an example:
image_urls = []
for element in image_elements:
image_urls.append(element.attrs['src'])
Printing the extracted URLs
Finally, let’s print the extracted image URLs:
for url in image_urls:
print(url)
Full code example
Here’s the full code example that puts it all together:
from requests_html import HTMLSession
session = HTMLSession()
response = session.get("https://example.com")
image_elements = response.html.find("img")
image_urls = []
for element in image_elements:
image_urls.append(element.attrs['src'])
for url in image_urls:
print(url)
And there you have it! You now know how to extract image URLs from web pages using the requests-html library in Python.
Happy extracting!