[파이썬] requests-html 데이터베이스 연동

06 Sep 2023

requests-html

In today’s blog post, we will explore how to connect and interact with a database using the requests-html library in Python. Requests-html is a powerful Python library that allows you to easily scrape websites and extract data. However, in this article, we will focus on leveraging its capabilities to connect and interact with databases.

Why Connect `requests-html` with a Database?

Requests-html is mainly used for web scraping, but it can also be utilized to retrieve data from a database. By connecting requests-html with a database, you can automate the process of extracting data from the web and directly storing it in a database, saving you time and effort.

Prerequisites

Before we get started, ensure that you have the following prerequisites:

Python installed on your machine
requests-html library installed (use pip install requests-html to install it)

Connecting to a Database

To connect requests-html with a database, you will need to install an appropriate database driver for your specific database. For example, if you are using MySQL, you will need to install the mysql-connector-python driver.

Once you have the necessary driver installed, you can establish a connection to the database using the following code snippet:

import requests_html
import mysql.connector

# Establish a connection to the database
database = mysql.connector.connect(
    host="localhost",
    user="your_username",
    password="your_password",
    database="your_database"
)

# Create a cursor object to interact with the database
cursor = database.cursor()

Make sure to replace "your_username", "your_password", and "your_database" with your actual database credentials.

Executing Database Queries with `requests-html`

Now that we have established a database connection, we can execute queries to retrieve or manipulate data. Let’s take a look at an example of executing a query to retrieve data from a table:

# Execute a select query
query = "SELECT * FROM table_name"
cursor.execute(query)

# Get all rows from the result set
result = cursor.fetchall()

# Loop through each row and extract data
for row in result:
    # Do something with the data
    print(row)

Replace "table_name" with the name of the table you want to retrieve data from.

Storing Web Data in a Database

Now, let’s see an example of how we can scrape data from a website using requests-html and store it directly into a database:

from requests_html import HTMLSession

# Create an HTML session
session = HTMLSession()

# Fetch the webpage
response = session.get("https://example.com")

# Extract relevant information from the webpage using CSS selectors
title = response.html.find("h1", first=True).text
description = response.html.find("p", first=True).text

# Insert the extracted data into the database
query = "INSERT INTO table_name (title, description) VALUES (%s, %s)"
values = (title, description)
cursor.execute(query, values)

# Commit the changes to the database
database.commit()

Replace "https://example.com" with the URL of the website from which you want to scrape data. Also, adjust the table and column names in the INSERT INTO query based on your database schema.

Conclusion

In this article, we learned how to connect requests-html with a database in Python. We saw how to establish a connection, execute queries, and store web data directly into a database using requests-html’s web scraping capabilities. By leveraging this integration, you can automate the process of extracting and storing data from the web, making your data management tasks more efficient.

Why Connect requests-html with a Database?