파이썬과 유전 알고리즘을 함께 사용하는 데이터 압축 알고리즘 구현

01 Oct 2023

python

개요

데이터 압축은 대용량의 데이터를 작게 압축하여 저장하거나 전송하는 기술입니다. 이번에는 파이썬 프로그래밍 언어와 유전 알고리즘을 조합하여 데이터 압축 알고리즘을 구현하는 방법에 대해 살펴보겠습니다.

유전 알고리즘 개요

유전 알고리즘은 생물의 진화 과정을 모방하여 최적의 해(solution)를 찾는 최적화 알고리즘입니다. 유전 알고리즘은 유전자의 개념을 사용하여 해의 집합을 표현하고, 자연 선택, 교차, 돌연변이 등의 연산을 통해 해의 품질을 개선해 나갑니다.

데이터 압축 알고리즘 구현

데이터 압축 알고리즘은 입력 데이터를 압축하여 출력 데이터로 변환하는 과정을 가지며, 여러가지 방법이 있습니다. 이번에는 유한 알파벳을 사용하는 간단한 데이터 압축 알고리즘을 구현해보겠습니다.

def compress_data(input_data):
    compressed_data = ""
    current_count = 1

    for i in range(1, len(input_data)):
        if input_data[i] == input_data[i - 1]:
            current_count += 1
        else:
            compressed_data += input_data[i - 1] + str(current_count)
            current_count = 1
    
    compressed_data += input_data[-1] + str(current_count)
    
    return compressed_data

위의 코드는 입력 데이터에서 연속으로 반복되는 알파벳과 그 개수를 압축하여 출력하는 기능을 가지고 있습니다. 예를 들어, 입력 데이터 “AAAABBBCCDAA”를 압축하면 “A4B3C2D1A2”와 같이 압축된 결과를 반환합니다.

이제 유전 알고리즘을 사용하여 압축된 데이터의 길이를 최적화해보겠습니다. 유전 알고리즘을 적용하기 위해 다음의 단계를 따릅니다:

초기 해(population)를 랜덤으로 생성합니다.
해의 품질을 평가하는 함수를 만듭니다.
해의 적합도를 기반으로 자연 선택을 수행하는 함수를 만듭니다.
선택된 해를 기반으로 교차 및 돌연변이 연산을 수행하는 함수를 만듭니다.
지정한 세대(iteration)만큼 2~4단계를 반복합니다.

유전 알고리즘을 사용하여 데이터 압축 알고리즘을 최적화하는 코드는 다음과 같습니다:

import random

def evaluate_fitness(compressed_data):
    target_length = 10  # 목표로 하는 압축된 데이터의 길이
    return abs(len(compressed_data) - target_length)

def selection(population, num_parents):
    fitness_scores = [evaluate_fitness(compressed_data) for compressed_data in population]
    parents = []

    for _ in range(num_parents):
        max_fitness_index = fitness_scores.index(max(fitness_scores))
        parents.append(population[max_fitness_index])
        fitness_scores[max_fitness_index] = -1
    
    return parents

def crossover(parents):
    population_size = len(parents)
    offspring = []

    for _ in range(population_size):
        parent1 = random.choice(parents)
        parent2 = random.choice(parents)
        crossover_point = random.randint(1, len(parent1) - 1)
        offspring.append(parent1[:crossover_point] + parent2[crossover_point:])
    
    return offspring

def mutation(offspring):
    population_size = len(offspring)
    mutated_offspring = []
    mutation_rate = 0.1

    for i in range(population_size):
        mutated = ""
        for j in range(len(offspring[i])):
            if random.random() < mutation_rate:
                mutated += random.choice("ABCDEFGHIJKLMNOPQRSTUVWXYZ")  # 알파벳에서 랜덤하게 선택
            else:
                mutated += offspring[i][j]
        
        mutated_offspring.append(mutated)
    
    return mutated_offspring

def compress_data_genetic_algorithm(input_data, num_generations):
    population_size = 10
    num_parents = 5
    current_population = [input_data] * population_size

    for _ in range(num_generations):
        parents = selection(current_population, num_parents)
        offspring = crossover(parents)
        mutated_offspring = mutation(offspring)
        current_population = parents + mutated_offspring
    
    best_compressed_data = min(current_population, key=evaluate_fitness)
    
    return best_compressed_data

위의 코드에서 compress_data_genetic_algorithm 함수는 입력 데이터와 세대(iteration) 수를 받아 최적화된 데이터 압축 결과를 반환합니다. 최적화된 압축 결과는 목표로 하는 압축된 데이터의 길이에 가깝도록 유전 알고리즘에 의해 생성된 해중 가장 적합한 해입니다.

결론

파이썬과 유전 알고리즘을 함께 사용하여 데이터 압축 알고리즘을 구현해보았습니다. 이를 통해 유전 알고리즘을 다른 최적화 문제에 적용하는데 도움이 될 것입니다. 압축 알고리즘은 실제로 더 복잡하고 고도화된 기술이지만, 이 예시를 통해 유전 알고리즘의 개념과 활용 방법을 이해할 수 있습니다.

#python #유전알고리즘 #데이터압축