[파이썬] requests-html 검색 엔진 최적화(SEO) 체크

06 Sep 2023

requests-html

In today’s digital era, search engine optimization (SEO) plays a crucial role in ensuring that your website ranks well on search engine result pages. To achieve this, it’s important to understand how search engines crawl and index web pages. In this blog post, we will explore how to perform an SEO check using the requests-html library in Python.

What is requests-html?

requests-html is a Python library that allows you to easily scrape and interact with web pages. It provides a simple and intuitive interface for making HTTP requests and parsing HTML content. With requests-html, we can retrieve web page data, extract relevant information, and perform various SEO checks.

Installing requests-html

To install requests-html, you can use pip, the Python package installer. Open your terminal or command prompt and run the following command:

pip install requests-html

Once the installation is complete, you can import requests_html module into your Python script or environment.

Performing SEO checks with requests-html

Now let’s dive into performing SEO checks using requests-html. We will demonstrate how to retrieve a web page, extract metadata, check for broken links, and analyze HTML structure. Here’s an example code snippet that illustrates these concepts:

from requests_html import HTMLSession

def perform_seo_check(url):
    # Create an HTML session
    session = HTMLSession()

    # Send a GET request to the specified URL
    response = session.get(url)

    # Extract page metadata
    title = response.html.xpath('//title/text()')[0]
    description = response.html.xpath('//meta[@name="description"]/@content')[0]
    keywords = response.html.xpath('//meta[@name="keywords"]/@content')[0]

    # Check for broken links
    links = response.html.xpath('//a/@href')
    broken_links = []
    for link in links:
        link_response = session.get(link)
        if link_response.status_code != 200:
            broken_links.append(link)

    # Analyze HTML structure
    html_structure = response.html.xpath('//*[not(self::script) and not(self::style)]').text

    # Print the results
    print(f"Title: {title}")
    print(f"Description: {description}")
    print(f"Keywords: {keywords}")
    print(f"Broken Links: {broken_links}")
    print(f"HTML Structure: {html_structure}")

url = "https://example.com"
perform_seo_check(url)

In the example above, we first import the HTMLSession class from requests_html. We then define a function perform_seo_check() that takes a URL as input and performs various SEO checks.

We create an HTML session using HTMLSession() and send a GET request to the specified URL. We extract the page metadata such as title, description, and keywords using XPath expressions. We check for broken links by iterating through all the links on the page and sending a GET request to each link. If the response status code is not 200 (OK), we consider it a broken link. Finally, we analyze the HTML structure by extracting all text content from the HTML, excluding script and style tags.

Running the code with a specific URL will print the results, including the title, description, keywords, broken links, and HTML structure.

Conclusion

Performing SEO checks on your website is essential to ensure better visibility and ranking on search engine result pages. By using the requests-html library in Python, you can easily retrieve web page data, extract relevant information, and perform SEO checks.

Remember to always follow SEO best practices and regularly monitor and optimize your website to improve its search engine ranking.