[파이썬] Scrapy 실시간 알림 설정

Scrapy

In this blog post, we will explore how to set up real-time notifications in Scrapy using Python. SCRAPY is a popular open-source web scraping framework that allows you to extract data from websites. Real-time notifications can be useful to monitor the progress of your Scrapy spiders and receive updates whenever a specific event occurs.

For the purpose of this tutorial, we will focus on setting up real-time notifications using Telegram, a widely used messaging platform. However, you can adapt these steps to use any other messaging platform or notification service.

Prerequisites

Before we proceed, make sure you have the following:

Setting up the Telegram bot

  1. Start by creating a new bot on Telegram using the BotFather. Follow the instructions to obtain an API key.
  2. Once you have the API key, create a new channel or group on Telegram where you want to receive the notifications.
  3. Add the bot to the channel or group as an admin.

Configuring Scrapy for real-time notifications

  1. Create a new Scrapy project or navigate to an existing project.
  2. Open the settings.py file in your scrapy project folder.
  3. Add the following lines to the file:
# Enable real-time notifications
TELEGRAM_BOT_TOKEN = '<your-telegram-bot-token>'
TELEGRAM_CHAT_ID = '<your-telegram-chat-id>'
DEFAULT_REQUEST_HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

Replace <your-telegram-bot-token> with the API token for your Telegram bot, and <your-telegram-chat-id> with the ID of your Telegram channel or group.

  1. Save the settings.py file.

Adding real-time notifications to your Scrapy spider

  1. Open the spider file where you want to enable real-time notifications.
  2. Import the necessary libraries:
import requests
from scrapy import signals
  1. Add the following code at the top of your spider class:
class MySpider(scrapy.Spider):
    name = 'my_spider'

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(MySpider, cls).from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
        crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
        return spider

    def spider_opened(self, spider):
        message = f"Spider {spider.name} has started."
        self.send_telegram_notification(message)

    def spider_closed(self, spider, reason):
        message = f"Spider {spider.name} has finished with reason: {reason}."
        self.send_telegram_notification(message)

    def send_telegram_notification(self, message):
        bot_token = self.settings.get('TELEGRAM_BOT_TOKEN')
        chat_id = self.settings.get('TELEGRAM_CHAT_ID')
        api_url = f"https://api.telegram.org/bot{bot_token}/sendMessage"
        payload = {
            'chat_id': chat_id,
            'text': message,
        }
        response = requests.post(api_url, data=payload)
        if response.status_code != 200:
            self.logger.warning('Failed to send Telegram notification.')
  1. Save the spider file.

Testing the real-time notifications

  1. Open your command-line interface and navigate to the Scrapy project folder.
  2. Start your Scrapy spider as usual with the command scrapy crawl my_spider.
  3. Check your Telegram channel or group for the notifications.

Congratulations! You have successfully set up real-time notifications in Scrapy using Python and Telegram. You can customize the notifications by modifying the messages and adding more events to listen for in the spider.

Keep in mind that this is just one approach to setting up real-time notifications. You can explore other platforms or services for notifications based on your requirements.

Happy web scraping!