[파이썬] Scrapy 단위 테스트 작성

06 Sep 2023

Scrapy

Scrapy is a powerful web scraping framework written in Python. It provides a convenient and efficient way to extract data from websites. To ensure the reliability and correctness of your Scrapy spider, it is important to write unit tests.

In this blog post, we will discuss how to write unit tests for Scrapy spiders using the built-in testing capabilities of the framework.

Why Unit Testing?

Unit testing is an essential part of software development. It helps identify and fix bugs early in the development process, ensures code quality, and provides a safety net for future changes and refactoring.

When it comes to web scraping with Scrapy, unit testing becomes even more crucial. Scrapy spiders interact with external websites and parse HTML/XML responses. Without proper testing, it is challenging to ensure that the spider is working as expected and handling different scenarios correctly.

Setting Up the Testing Environment

Before writing unit tests for your Scrapy spiders, you need to set up the testing environment. Here are the steps:

Create a new directory for your tests, preferably inside your Scrapy project directory.
Install the necessary testing dependencies, including pytest, pytest-asyncio, and httpretty. You can install them using pip:

$ pip install pytest pytest-asyncio httpretty

Create a new Python file inside the tests directory, e.g., test_spider.py. This file will contain your unit tests.

Writing Unit Tests

To write unit tests for your Scrapy spiders, you need to understand the structure of a typical Scrapy spider. A spider usually consists of the following components:

A name attribute specifying the spider’s name.
A start_requests method defining how to generate the initial requests.
One or multiple parse methods for processing the responses.

Here’s an example of a Scrapy spider that retrieves and parses quotes from a website:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ["http://quotes.toscrape.com"]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("span small::text").get(),
            }

To test this spider, you can use the pytest framework. Write individual test cases as separate methods and use Scrapy’s CrawlerRunner to execute the spider within the test. Here’s an example:

import pytest
from scrapy.crawler import CrawlerRunner
from scrapy.utils.project import get_project_settings

from my_spider import QuotesSpider

@pytest.fixture
def crawler():
    return CrawlerRunner(get_project_settings())

@pytest.mark.asyncio
async def test_quotes_spider(crawler):
    spider = QuotesSpider()
    await crawler.crawl(spider)
    quotes = await crawler.join()

    assert len(quotes) == 10  # Assuming there are 10 quotes on the webpage

In this example, we define a fixture crawler that returns a CrawlerRunner instance. The test_quotes_spider method then defines the test case. It instantiates the QuotesSpider, runs it using the CrawlerRunner, and asserts the expected number of quotes.

Running the Unit Tests

To run the unit tests, open your terminal and navigate to the tests directory. Then, execute the pytest command:

$ pytest

Pytest will automatically discover and run the tests in the current directory. You should see the test results, indicating whether the tests passed or failed.

Conclusion

Unit testing is crucial for developing reliable and robust Scrapy spiders. In this blog post, we discussed the importance of unit testing in Scrapy projects and learned how to write unit tests using the pytest framework.

By incorporating unit tests into your Scrapy workflow, you can ensure that your spiders are functioning as expected, catch and fix bugs early, and have confidence in the integrity of your scraping code.