[파이썬] Pillow 이미지 콘텐츠 기반 검색

When working with images in Python, the Pillow library is a popular choice due to its extensive functionality and ease of use. One valuable feature of Pillow is its ability to perform content-based image search, allowing us to find similar images based on their visual content rather than metadata or textual descriptions.

In this blog post, we will explore how to use Pillow for content-based image search in Python. We will assume that you have already installed the Pillow library, which can be done by running pip install pillow in your command line or terminal.

Preparing the Image Dataset

Before we can perform content-based image search, we need to create a dataset of images that we can search through. The dataset should consist of a collection of images and associated labels or filenames.

We can create a simple image dataset by downloading a set of images using Python’s urllib module and saving them in a specific directory. In this example, let’s suppose we have a directory named dataset where we have saved our images.

import os
import urllib.request

# Image URLs to download
image_urls = [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg",
    "https://example.com/image3.jpg",
    # Add more image URLs as needed
]

# Directory to save the images
dataset_dir = "dataset"

# Create the directory if it doesn't exist
os.makedirs(dataset_dir, exist_ok=True)

# Download and save the images
for i, url in enumerate(image_urls):
    filename = f"image{i+1}.jpg"
    urllib.request.urlretrieve(url, os.path.join(dataset_dir, filename))

print("Images downloaded and saved successfully.")

With the image dataset prepared, we can now perform content-based image search using Pillow. The basic idea is to calculate image descriptors for each image in the dataset and then compare these descriptors to find similar images.

There are various image descriptors that can be used, such as color histograms and local feature descriptors like SIFT or SURF. In this example, let’s use a simple approach based on image hashing using the dhash algorithm.

from PIL import Image, ImageChops

def calculate_dhash(image):
    # Convert the image to grayscale
    image = image.convert("L")

    # Resize the image to a fixed size for consistency
    image = image.resize((8, 8), Image.ANTIALIAS)

    # Calculate the difference hash (dhash) of the image
    dhash = sum([
        (1 << (x + y * 8)) if image.getpixel((x, y)) > image.getpixel((x + 1, y)) else 0
        for x in range(8)
        for y in range(8)
    ])

    return dhash

# Load the query image for content-based search
query_image = Image.open("query_image.jpg")

# Calculate the dhash of the query image
query_dhash = calculate_dhash(query_image)

# Search for similar images in the dataset
similar_images = []
for filename in os.listdir(dataset_dir):
    if filename.endswith(".jpg"):
        image_path = os.path.join(dataset_dir, filename)
        image = Image.open(image_path)
        image_dhash = calculate_dhash(image)

        # Compare the dhash values to find similar images
        if bin(query_dhash ^ image_dhash).count("1") <= 5:
            similar_images.append(image_path)

print("Similar images found:", similar_images)

In the code above, we define the calculate_dhash function to calculate the dhash of an image. We then load the query image for content-based search, calculate its dhash, and compare it with the dhash of each image in the dataset. The threshold of 5 in the bin(query_dhash ^ image_dhash).count("1") <= 5 condition determines the similarity level.

Conclusion

Content-based image search is a powerful technique that allows us to find similar images based on their visual content. In this blog post, we explored how to perform content-based image search using the Pillow library in Python. We prepared an image dataset and used the image hashing approach to find similar images.

With Pillow’s extensive capabilities, you can further enhance the content-based image search workflow by incorporating advanced image descriptors or using machine learning techniques. Pillow provides a solid foundation for building image search applications and exploring the world of computer vision.

Start experimenting with Pillow’s content-based image search capabilities in your Python projects and unlock new possibilities in working with images.