[파이썬] requests-html SSL 인증

In this blog post, we will discuss how to handle SSL (Secure Sockets Layer) certification while using the requests-html library in Python.

Background

SSL is a security protocol that provides a secure and encrypted connection between a client and a server. It ensures that the data transmitted between the client and server remains private and secure. When accessing websites or web resources in Python, SSL certificates are checked by default to validate the authenticity of the server.

The requests-html Library

requests-html is a Python library that allows you to easily scrape and interact with HTML web pages. It provides a convenient way to make HTTP requests and parse HTML content.

SSL Verification in requests-html

By default, requests-html validates SSL certificates when making requests. However, there might be cases where you need to disable SSL certificate verification explicitly, such as when the certificate is self-signed or when testing in a development environment.

To disable SSL verification, you can pass the verify parameter as False when instantiating the HTMLSession object:

from requests_html import HTMLSession

session = HTMLSession(verify=False)

By setting verify to False, you inform requests-html not to perform SSL certificate validation.

Handling Invalid SSL Certificates

If you encounter a website with an invalid SSL certificate, you may choose to ignore the certificate verification or add the certificate to your trusted certificates list.

To ignore certificate verification for a specific request, you can pass the verify parameter as False when making the request:

response = session.get(url, verify=False)

Keep in mind that disabling SSL certificate verification introduces security risks, as it allows connections to potentially untrustworthy or compromised servers.

If you want to trust a specific SSL certificate without compromising overall security, you can add the certificate to your trusted certificates list. This can be done by specifying the path to the certificate in the session.verify attribute:

session.verify = '/path/to/certificate.pem'

By updating the session.verify attribute with the certificate path, requests-html will validate SSL certificates against the provided certificate.

Conclusion

Handling SSL certification in requests-html is essential for ensuring secure and reliable communication between your Python script and web resources. While it is possible to disable SSL certificate verification, it is recommended to only do so when necessary and to carefully evaluate the trustworthiness of the server. Always prioritize security and consider the implications of disabling SSL certificate validation.

Remember to import the necessary modules and libraries to successfully execute the code samples provided in this blog post.