[파이썬] requests-html 데이터 저장: JSON 포맷

Python’s requests-html library is a powerful tool for web scraping and parsing HTML content. Once you have extracted the desired data using requests-html, you may want to save it in a structured format like JSON for further analysis or storage. In this blog post, we will explore how to save requests-html data in JSON format using Python.

Installing requests-html

Before we start, make sure you have the requests-html library installed on your system. You can install it using pip by running the following command:

pip install requests-html

Importing the necessary modules

To begin, we need to import the required modules. In this case, we will be using requests-html and the built-in json module in Python. Include the following lines of code at the beginning of your script:

from requests_html import HTMLSession
import json

Making a request and extracting data

Next, we need to perform a request and extract the required data using requests-html. Here’s a simple example of how you can do that:

session = HTMLSession()
response = session.get('https://example.com')

# Extracting data using CSS selectors
titles = response.html.find('h1')

In this example, we create an HTMLSession object and use it to send a GET request to the specified URL. We then use CSS selectors to extract the desired data (in this case, h1 tags) from the response.

Saving data in JSON format

Now that we have extracted the data, we can save it in JSON format. The json module in Python provides methods to convert Python objects into JSON strings and write them to a file. Here’s an example of how you can save the extracted data to a JSON file:

data = {'titles': [title.text for title in titles]}

# Save data in JSON format
with open('data.json', 'w') as file:
    json.dump(data, file)

In this example, we create a dictionary called data and store the extracted titles within it. We then open a file named “data.json” in write mode and use the json.dump() method to write the data dictionary as a JSON object to the file.

Conclusion

By combining the power of requests-html and the built-in json module in Python, you can easily scrape and save web data in JSON format. This enables you to process and analyze the extracted data efficiently. Remember to handle errors and exceptions when dealing with web scraping, and respect the website’s terms of service. Happy coding!