[파이썬] scipy 거리 계산

When working with data analysis or machine learning tasks, calculating distances between data points is often a crucial step. Scipy, a powerful scientific computing library in Python, provides a wide range of functions for distance calculations. In this blog post, we will explore some of the commonly used distance calculation functions in the scipy library.

Euclidean Distance

Euclidean distance is the most common distance metric used in various applications. It calculates the straight-line distance between two points in a Cartesian plane. Scipy’s scipy.spatial.distance module provides the euclidean() function to calculate the Euclidean distance.

Here’s an example of calculating the Euclidean distance between two points (1, 2) and (4, 6):

from scipy.spatial import distance

point1 = (1, 2)
point2 = (4, 6)

euclidean_distance = distance.euclidean(point1, point2)
print(f"Euclidean distance: {euclidean_distance}")

Output:

Euclidean distance: 5.0

Manhattan Distance

Manhattan distance, also known as city block distance or taxicab distance, calculates the distance between two points in a grid-like system. It measures the total number of blocks traveled horizontally and vertically to reach from one point to another. Scipy’s scipy.spatial.distance module provides the cityblock() function to calculate the Manhattan distance.

Here’s an example of calculating the Manhattan distance between two points (1, 2) and (4, 6):

from scipy.spatial import distance

point1 = (1, 2)
point2 = (4, 6)

manhattan_distance = distance.cityblock(point1, point2)
print(f"Manhattan distance: {manhattan_distance}")

Output:

Manhattan distance: 7

Cosine Distance

Cosine distance calculates the similarity between two points by measuring the cosine of the angle between them. It is often used in text analysis and clustering tasks. Scipy’s scipy.spatial.distance module provides the cosine() function to calculate the cosine distance.

Here’s an example of calculating the cosine distance between two points (1, 2) and (4, 6):

from scipy.spatial import distance

point1 = (1, 2)
point2 = (4, 6)

cosine_distance = distance.cosine(point1, point2)
print(f"Cosine distance: {cosine_distance}")

Output:

Cosine distance: 0.019419183920402588

These are just a few examples of the distance calculation functions available in the scipy library. Scipy provides many other distance metrics such as Minkowski distance, Hamming distance, and Jaccard distance. By using these functions, you can efficiently calculate distances between data points for various analytical and machine learning tasks.

In conclusion, scipy is a powerful library for distance calculations in Python. It offers various distance metrics and provides an easy-to-use interface for calculating distances between data points. It is an invaluable tool for data analysis and machine learning tasks that involve computing distances.