[파이썬] Scrapy 로깅 설정

Scrapy is a powerful web scraping framework written in Python. It allows you to extract data from websites in an efficient and structured manner. The logging functionality in Scrapy is a crucial feature that helps you debug and monitor the scraping process. In this blog post, we will explore how to configure logging in Scrapy to better understand the behavior of your spiders.

Basic Logging Configuration

Scrapy uses the built-in Python logging module to handle logging. By default, it logs to the console with a log level of DEBUG. However, you can customize this behavior according to your needs.

To configure the logging in Scrapy, you need to modify the settings.py file in your Scrapy project. Add the following code to the file:

import logging

LOG_LEVEL = logging.WARNING
LOG_FORMAT = '%(asctime)s [%(name)s] %(levelname)s: %(message)s'
LOG_DATEFORMAT = '%Y-%m-%d %H:%M:%S'
LOG_FILE = 'scrapy.log'

Let’s break down what each setting does:

Using Loggers in Spiders

Now that we have configured the logging settings, let’s see how to use loggers in Scrapy spiders. You can use the logger attribute provided by the scrapy.Spider class to log messages from your spider code.

Here’s an example:

import scrapy

class MySpider(scrapy.Spider):
    name = 'my_spider'

    def start_requests(self):
        yield scrapy.Request(url='http://example.com', callback=self.parse)

    def parse(self, response):
        self.logger.info('Scraping website: %s', response.url)
        # Scraping code goes here

In the parse method of your spider, you can use self.logger to log messages with different log levels, such as debug, info, warning, or error. These log messages will be displayed or saved based on the configured log level and format in the settings.py file.

Viewing Logs

To view the logs generated by your Scrapy spider, you can run the spider with the following command:

scrapy crawl my_spider

By default, the logs will be displayed in the console. If you have specified a log file in the settings.py file, the logs will also be saved to that file.

Conclusion

Configuring logging in Scrapy is essential for monitoring and debugging your web scraping operations. By customizing the log level, format, and output file, you can have better control over the logging behavior of your spiders. Using the logger attribute in your spiders allows you to add informative log messages throughout your scraping code.

Feel free to experiment with different log levels, formats, and even multiple loggers to suit your specific requirements. Happy scraping!

References: