[파이썬] requests-html 데이터 저장: XML 포맷

In this blog post, we will explore how to save data obtained using the requests-html library in XML format using Python. requests-html is a powerful Python library that allows you to easily scrape websites and extract data.

For this example, let’s assume we have already installed requests-html using pip. If you haven’t installed it yet, you can do so by running the following command:

pip install requests-html

Let’s get started with an example.

First, let’s import the necessary libraries and create an instance of the HTMLSession class from requests-html:

from requests_html import HTMLSession

session = HTMLSession()

Next, we will use the get method of the session object to send a GET request to a webpage and extract the HTML content:

url = "https://example.com"

response = session.get(url)

Now that we have obtained the HTML content, we can search for specific elements using CSS selectors:

# Extract all the paragraphs from the webpage
paragraphs = response.html.find("p")

# Extract the text content of the first paragraph
first_paragraph = paragraphs[0].text

With the data extracted, we can now save it in XML format. For this purpose, we will use the xml.etree.ElementTree module:

import xml.etree.ElementTree as ET

# Create a root element for the XML document
root = ET.Element("data")

# Create a child element for the first paragraph text
paragraph_element = ET.SubElement(root, "paragraph")
paragraph_element.text = first_paragraph

We can add more elements to the XML document if desired. Once we have added all the necessary data, we can save it to a file using the ElementTree API:

# Create an ElementTree object
tree = ET.ElementTree(root)

# Save the XML document to a file
tree.write("data.xml", encoding="utf-8", xml_declaration=True)

And that’s it! We have successfully saved the extracted data in XML format. You can now view the contents of the data.xml file.

To summarize, in this blog post, we have explored how to use the requests-html library to scrape data from a webpage and save it in XML format using Python. This technique can be useful when you want to organize and store structured data obtained from web scraping activities. Feel free to experiment and extend the example to suit your specific use case.