[파이썬] scipy CSR 형식

Scipy is a popular open-source library in Python for scientific computing and provides efficient implementations of many numerical algorithms. One of the data formats it supports is the Compressed Sparse Row (CSR) format, which is used to represent sparse matrices.

In the CSR format, a sparse matrix is stored in three arrays: data, indices, and indptr. Let’s take a closer look at each of these arrays.

data array: This array stores the non-zero elements of the sparse matrix, row by row. The data values are typically stored in a one-dimensional array in row-major order.

indices array: This array stores the column index of each corresponding non-zero element in the data array. The indices are also stored in a one-dimensional array.

indptr array: This array contains the starting position of each row in the data and indices arrays. It has one additional element at the end, which represents the total number of non-zero elements in the sparse matrix.

To create a sparse matrix in CSR format using Scipy, you can use the scipy.sparse module. Here’s an example code snippet that demonstrates how to create a sparse matrix in CSR format:

import scipy.sparse as sp

# Define the data, indices, and indptr arrays
data = [1, 2, 3, 4, 5, 6]
indices = [0, 2, 1, 0, 2, 1]
indptr = [0, 2, 3, 6]

# Create the CSR matrix
matrix = sp.csr_matrix((data, indices, indptr), shape=(3, 3))

print(matrix)

Output:

  (0, 0)	1
  (0, 2)	2
  (1, 1)	3
  (2, 0)	4
  (2, 2)	5
  (2, 1)	6

In the above example, we first define the data, indices, and indptr arrays. We then use the csr_matrix function from the scipy.sparse module to create the sparse matrix, passing in the arrays and the shape of the matrix.

Finally, we print the sparse matrix, which displays the non-zero elements and their corresponding row and column indices.

The CSR format is particularly useful for efficiently performing mathematical operations on sparse matrices. Scipy provides a wide range of functions and operations that can be applied directly to matrices in CSR format, making it a powerful tool for working with large, sparse datasets.

By utilizing the CSR format in Scipy, you can leverage the benefits of sparse matrix representations and unlock the potential for more efficient and memory-friendly computations.