[파이썬] scipy 데이터 스무딩

Data smoothing is a commonly used technique in data analysis to reduce noise and visualize trends within the data. In Python, one of the popular libraries for data smoothing is Scipy. Scipy provides various functions and tools for manipulating and analyzing scientific data.

What is data smoothing?

Data smoothing involves applying a mathematical algorithm to remove noise and irregularities from a dataset, making it easier to identify patterns and trends. It is particularly useful when working with data that contains random fluctuations or outliers.

Smoothing with Scipy

Scipy offers several functions for data smoothing, the most commonly used ones are scipy.signal.savgol_filter() and scipy.ndimage.gaussian_filter().

Savitzky-Golay smoothing

The Savitzky-Golay filter is a widely used data smoothing technique that uses a sliding window to fit a polynomial function to a subset of data points, and then takes the average of the values within the window to smooth the data.

Here’s an example of how to apply the Savitzky-Golay filter using the scipy.signal.savgol_filter() function:

import numpy as np
from scipy.signal import savgol_filter

# Generate a noisy sine wave
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x) + np.random.normal(0, 0.1, 100)

# Apply Savitzky-Golay smoothing
smoothed = savgol_filter(y, window_length=11, polyorder=2)

# Plot the original and smoothed data
plt.plot(x, y, label='Original')
plt.plot(x, smoothed, label='Smoothed')
plt.legend()
plt.show()

The window_length parameter determines the size of the sliding window, and the polyorder parameter determines the degree of the polynomial used for fitting.

Gaussian smoothing

Gaussian smoothing, also known as Gaussian blur, is another popular data smoothing technique that uses a Gaussian filter to blur the data. It replaces each pixel in the data with a weighted average of its neighboring pixels, where the weights are determined by a Gaussian distribution.

Here’s an example of how to apply Gaussian smoothing using the scipy.ndimage.gaussian_filter() function:

import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter

# Generate a noisy 2D array
data = np.random.randn(100, 100)
noisy_data = data + np.random.normal(0, 0.1, (100, 100))

# Apply Gaussian smoothing
smoothed_data = gaussian_filter(noisy_data, sigma=2)

# Plot the original and smoothed data
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
ax1.imshow(noisy_data)
ax1.set_title('Original')
ax2.imshow(smoothed_data)
ax2.set_title('Smoothed')
plt.show()

The sigma parameter determines the standard deviation of the Gaussian filter, which controls the blurring effect. Higher values of sigma result in stronger smoothing.

Conclusion

Data smoothing is a valuable technique in data analysis for reducing noise and revealing underlying patterns. Scipy provides a variety of functions for data smoothing, including the Savitzky-Golay filter and Gaussian smoothing. By applying these techniques in Python, you can effectively visualize trends within your data and make more accurate analyses.