[python] Beautiful Soup로 웹 페이지에서 첫 번째 태그 찾기

30 Nov 2023

python

웹 스크래핑(웹 페이지에서 정보 추출)을 할 때 가장 많이 사용되는 라이브러리 중 하나인 Beautiful Soup를 사용하여 웹 페이지에서 첫 번째 태그를 찾아보겠습니다.

Beautiful Soup는 HTML 및 XML 문서의 파싱에 사용되며, 웹 페이지의 요소를 쉽게 탐색하고 조작할 수 있도록 도와줍니다.

라이브러리 설치

먼저, Beautiful Soup를 사용하기 위해 라이브러리를 설치해야 합니다. 아래의 명령을 사용하여 설치할 수 있습니다.

pip install beautifulsoup4

예제 코드

다음은 Beautiful Soup를 사용하여 웹 페이지에서 첫 번째 태그를 찾는 예제 코드입니다.

import requests
from bs4 import BeautifulSoup

# 웹 페이지의 URL 주소
url = "https://example.com"

# requests를 사용하여 웹 페이지의 HTML 데이터 가져오기
response = requests.get(url)
html_data = response.text

# BeautifulSoup 객체 생성
soup = BeautifulSoup(html_data, 'html.parser')

# 첫 번째 태그 찾기
first_tag = soup.find()

# 태그 출력
print(first_tag)

위의 코드에서는 requests 모듈을 사용하여 웹 페이지의 HTML 데이터를 가져옵니다. 그 후, BeautifulSoup 객체를 생성하여 HTML 데이터를 파싱합니다. find() 메서드를 사용하여 원하는 태그를 찾을 수 있습니다. 이후 print() 함수를 사용하여 첫 번째 태그를 출력합니다.

실행 결과

위의 예제 코드를 실행하면 HTML 문서에서 첫 번째 태그를 찾아 출력할 수 있습니다.

예를 들어, https://example.com 웹 페이지의 첫 번째 태그를 출력해보면 다음과 같이 나올 수 있습니다.

<html>
<head>
<title>Example Domain</title>
</head>
<body>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</body>
</html>

참고 자료

Beautiful Soup 공식 문서: https://www.crummy.com/software/BeautifulSoup/bs4/doc/