[파이썬] requests-html 페이지의 메타 정보 추출

06 Sep 2023

requests-html

In this blog post, we will explore how to extract meta information from an HTML page using the requests-html library in Python. The requests-html library is a powerful tool for web scraping and provides an easy-to-use interface for fetching and manipulating HTML content.

To begin, let’s first install the requests-html library by running the following command:

pip install requests-html

Once installed, we can import the necessary modules and create an instance of the HTMLSession class as shown below:

from requests_html import HTMLSession

# Create an instance of HTMLSession
session = HTMLSession()

Next, we can use the session.get() method to make a GET request to the desired webpage:

url = "https://example.com"  # Replace with the URL of the webpage you want to extract meta information from
response = session.get(url)

Once we have obtained the response, we can use the html property to access the HTML content of the webpage:

# Get the HTML content
html_content = response.html

To extract meta information from the HTML content, we can use the find() method and pass in the appropriate CSS selector. In this example, we will extract the title, description, and keywords of the webpage using the respective meta tags:

# Extract the title, description, and keywords
title = html_content.find("meta[property='og:title']", first=True).attrs["content"]
description = html_content.find("meta[property='og:description']", first=True).attrs["content"]
keywords = html_content.find("meta[name='keywords']", first=True).attrs["content"]

Finally, we can print out the extracted meta information:

# Print the extracted meta information
print("Title:", title)
print("Description:", description)
print("Keywords:", keywords)

That’s it! With just a few lines of code, we can easily extract meta information from an HTML page using the requests-html library in Python. This can be useful for various web scraping tasks or when building web crawlers or data mining applications.

Remember to handle any exceptions that may occur during the web scraping process and be mindful of respecting the website’s terms of service and legal guidelines.

Happy coding!