[파이썬] requests-html Open Graph 태그 추출

The Open Graph protocol allows websites to define their metadata in a structured manner, making it easier for social media platforms and other applications to extract relevant information when sharing website links. One of the most commonly used libraries for web scraping in Python is requests-html, which provides a convenient way to extract data from HTML documents.

In this tutorial, we will explore how to use the requests-html library to extract Open Graph tags from a webpage in Python.

Installation

To get started, you need to install the requests-html library. Open your terminal or command prompt and run the following command:

pip install requests-html

Scraping Open Graph Tags

We will begin by importing the necessary modules and creating an instance of the HTMLSession class from the requests-html library.

from requests_html import HTMLSession

# Create an instance of HTMLSession
session = HTMLSession()

Next, we will use the get() method to fetch the HTML content of a webpage.

url = "https://example.com"
response = session.get(url)

Once we have the HTML content, we can use the requests-html library to extract the Open Graph tags. These tags are usually found in the <head> section of the HTML document.

# Extract Open Graph tags
og_tags = response.html.xpath('//meta[@property="og:*"]')

The xpath() method accepts an XPath expression as a parameter and returns a list of elements that match the expression. In this case, we are using the XPath expression //meta[@property="og:*"] to find all <meta> tags with the property attribute starting with “og:”.

Finally, we can loop through the Open Graph tags and extract the desired information, such as the title, description, or image URL.

for tag in og_tags:
    property_name = tag.attrs["property"]
    content = tag.attrs["content"]
    print(f"{property_name}: {content}")

By printing the property and content attributes of each Open Graph tag, we can easily see the extracted metadata.

Conclusion

In this tutorial, we learned how to use the requests-html library in Python to scrape Open Graph tags from webpages. With the ability to extract structured metadata, we can now utilize this information for various purposes such as sharing links on social media platforms or creating custom previews for our website links.