[python] Beautiful Soup로 웹 페이지의 모든 텍스트 길이 계산하기

30 Nov 2023

python

개요

Beautiful Soup은 Python의 라이브러리로, HTML 및 XML 문서를 파싱하고 조작하는 데 사용됩니다. 이 라이브러리를 사용하여 웹 페이지의 모든 텍스트의 길이를 계산하는 방법을 알아보겠습니다.

절차

필요한 패키지를 가져옵니다.

from bs4 import BeautifulSoup
import requests

웹 페이지의 HTML을 가져옵니다.

url = "https://www.example.com"
response = requests.get(url)
html = response.text

BeautifulSoup 객체를 생성합니다.

soup = BeautifulSoup(html, 'html.parser')

모든 텍스트 요소를 가져옵니다.

text_elements = soup.find_all(text=True)

텍스트 길이를 계산합니다.

total_length = 0
for element in text_elements:
    total_length += len(element.strip())

결과를 출력합니다.

print("웹 페이지의 모든 텍스트 길이는 {}입니다.".format(total_length))

예제 코드

from bs4 import BeautifulSoup
import requests

url = "https://www.example.com"
response = requests.get(url)
html = response.text

soup = BeautifulSoup(html, 'html.parser')

text_elements = soup.find_all(text=True)

total_length = 0
for element in text_elements:
    total_length += len(element.strip())

print("웹 페이지의 모든 텍스트 길이는 {}입니다.".format(total_length))

참고 자료

Beautiful Soup Documentation