[파이썬] PyTorch `torchvision` 활용

07 Sep 2023

PyTorch

PyTorch is a popular open-source machine learning framework that is widely used for developing deep learning models. One of its key features is the torchvision package, which provides a wide range of tools and functionalities for working with image data.

In this blog post, we will explore some of the useful features provided by torchvision and learn how to leverage them in Python.

Installation

To use torchvision, you need to have PyTorch installed. You can install PyTorch and torchvision using the following command:

pip install torch torchvision

Make sure you have the correct version of PyTorch installed that matches your Python environment.

Image Transformation

torchvision provides a convenient way to perform various transformations on images. These transformations can be used to preprocess your image data before feeding it into a PyTorch model.

Let’s start by importing the necessary libraries:

import torch
import torchvision
from torchvision import transforms

Next, we can define a sequence of transformations that we want to apply to our images. For example, we can use the transforms.Compose class to combine multiple transformations:

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

In the above code, we resize our image to 256x256 pixels, crop it to 224x224 pixels at the center, convert it to a PyTorch tensor, and normalize its pixel values.

Once we have defined our transformation sequence, we can apply it to an image as follows:

image = Image.open("example.jpg")
transformed_image = transform(image)

Dataset Loading

torchvision provides several pre-built datasets that you can use for training and evaluation. These datasets include famous image classification benchmarks such as MNIST, CIFAR-10, and ImageNet.

To load a dataset, you can use the datasets module:

train_dataset = torchvision.datasets.MNIST(root="./data", train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root="./data", train=False, transform=transform, download=True)

In the above code, we load the MNIST dataset and apply the previously defined transformation to both the training and testing sets.

DataLoader

The torchvision package also provides the DataLoader class, which allows you to efficiently load and iterate over your dataset during training. The DataLoader class provides features like shuffling, batching, and parallel loading.

Here’s an example of how to create a DataLoader:

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

In the above code, we create DataLoader objects for the training and testing datasets. We specify the batch size and whether to shuffle the samples.

Model Pre-training

torchvision provides pre-trained models that you can use for various tasks such as image classification, object detection, and semantic segmentation. The pre-trained models are trained on large-scale datasets like ImageNet and achieve state-of-the-art performance.

To load a pre-trained model, you can use the models module:

model = torchvision.models.resnet50(pretrained=True)

In the above code, we load a pre-trained ResNet-50 model. You can choose from a variety of pretrained models available in torchvision.models.

Conclusion

In this blog post, we have explored some of the key features provided by torchvision in PyTorch. We learned how to perform image transformations, load datasets, create data loaders, and utilize pre-trained models.

torchvision simplifies the process of working with image data in PyTorch and provides a convenient and efficient way to build powerful deep learning models.