[파이썬] 웹 스크래핑과 동물 정보 추출

01 Sep 2023

python

웹 스크래핑은 웹 사이트에서 데이터를 추출하는 프로세스를 말합니다. 이러한 데이터 추출은 다양한 용도로 사용될 수 있으며, 동물 정보 추출도 그 중 하나입니다. 동물 정보 추출은 동물 종류, 특성, 이미지 등과 같은 다양한 정보를 웹에서 수집하여 활용하는 작업을 의미합니다.

Python은 웹 스크래핑에 매우 효과적인 프로그래밍 언어로, 다양한 라이브러리를 제공하여 웹 스크래핑 작업을 쉽게 수행할 수 있습니다. 이번 블로그 포스트에서는 Python을 사용하여 동물 정보를 추출하는 예제 코드를 소개하겠습니다.

라이브러리 설치

Python에서 웹 스크래핑을 위해 beautifulsoup4와 requests 라이브러리를 설치해야 합니다. 아래의 명령어를 사용하여 라이브러리를 설치하세요.

pip install beautifulsoup4
pip install requests

웹 스크래핑 코드 예제

아래의 예제 코드는 ‘https://www.animalinfo.com’ 웹 사이트에서 동물 정보를 추출하는 코드입니다. 해당 웹 사이트는 다양한 동물에 대한 정보를 제공하고 있습니다.

import requests
from bs4 import BeautifulSoup

url = 'https://www.animalinfo.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

animals = soup.select('.animal-card')

for animal in animals:
    name = animal.select_one('.animal-name').text
    species = animal.select_one('.animal-species').text
    description = animal.select_one('.animal-description').text
    image_url = animal.select_one('.animal-image')['src']
    
    print(f'Name: {name}')
    print(f'Species: {species}')
    print(f'Description: {description}')
    print(f'Image URL: {image_url}')
    print('---')

위의 코드는 웹 사이트에서 .animal-card 클래스를 가진 요소를 선택하고, 각 요소에서 동물의 이름, 종류, 설명 및 이미지 URL을 추출합니다. 이후 각 동물의 정보를 출력합니다.

출력 결과는 다음과 같이 나타날 것입니다.

Name: Lion
Species: Panthera leo
Description: The lion is a large felid of the genus Panthera in the family Felidae.
Image URL: https://www.animalinfo.com/images/lion.jpg
---
Name: Elephant
Species: Loxodonta
Description: Elephants are large mammals of the family Elephantidae and the order Proboscidea.
Image URL: https://www.animalinfo.com/images/elephant.jpg
---
...

위의 예제 코드를 참고하여 웹 스크래핑을 수행하고, 원하는 동물 정보를 추출해보세요. 웹 스크래핑을 통해 다양한 동물 정보를 쉽게 수집할 수 있습니다.