[파이썬] Scrapy Callback 함수 사용

06 Sep 2023

Scrapy

Scrapy is a powerful web scraping framework in Python. It allows us to easily extract data from websites by sending HTTP requests, parsing the responses, and handling various web crawling tasks. One of the key features of Scrapy is its ability to use callback functions to handle different stages of the scraping process.

What are callback functions?

In Scrapy, callback functions are functions that are called when certain events occur during the scraping process. These events include when a request is made, when a response is received, when an item is scraped, and so on. By using callback functions, we can customize the behavior of our spider and perform specific actions at different stages.

How to use callback functions in Scrapy?

To use callback functions in Scrapy, we need to define a method inside our spider class and assign it as the callback function for a specific request. Here’s an example:

import scrapy

class MySpider(scrapy.Spider):
    name = 'example_spider'

    # Our callback function
    def parse(self, response):
        # Here, we can write our custom code to extract data from the response
        # For demonstration purposes, let's just print the response body
        print(response.body)

        # We can also make more requests or perform other scraping actions inside the callback
        yield scrapy.Request(url='https://www.example.com/another_page', callback=self.parse_another_page)

    # Another callback function
    def parse_another_page(self, response):
        # Here, we can extract data from the response of the second page
        # For example, let's extract the title of the page
        title = response.css('h1::text').get()
        print("Title:", title)

In the example above, we define two callback functions: parse and parse_another_page. The parse function is the default callback function that is called when a response is received for a request made by our spider. Inside this function, we can write our custom code to extract data from the response and perform other actions.

In the parse function, we print the response body and make another request to a different page by using yield scrapy.Request(). We assign parse_another_page as the callback function for this new request. When the second page’s response is received, the parse_another_page function is called, where we can extract data from the response and perform further actions.

Conclusion

Callback functions are essential in Scrapy for customizing the scraping process and performing specific actions at different stages. By using callback functions, we can extract data, make additional requests, and navigate through multiple pages. Understanding and effectively utilizing callback functions will greatly enhance our web scraping capabilities with Scrapy.