[pandas] DataFrame 함수 2

04 Dec 2020

pandas

DataFrame의 분석을 위한 함수들(2)

Pandas에 있는 DataFrame에 관한 함수들에 대해서 알아본다.

1. sort

DataFrame을 정렬하기 위해서는 두가지 방법만 알아두면 된다.

예제로 다음의 DataFrame을 사용한다.

sort_index : index 기준(values가 아닌)으로 정렬한다. 옵션으로 axis와 ascending을 줄 수 있다.

display(df.sort_index()) # option default : axis = 0, ascending = True

display(df.sort_index(ascending=False)) # 내림차순

display(df.sort_index(axis = 1 ,ascending = False)) # column 내림차순

sort_values : values기준(index가 아닌)으로 정렬한다. 옵션으로 axis, ascending과 by를 줄 수 있다. index의 순서는 바뀌지 않는다.

display(df.sort_values(by='B'))  # B 값들의 오름차순 기준으로 정렬한다.

display(df.sort_values(by=['B','D'])) # B값 다음으로 D값들의 기준으로 정렬한다.

2. unique

DataFrame의 행이나 열을 뽑아내 (즉, Series로 뽑아진것) 유일한 성분들을 list를 배출한다.

print(df.iloc[1,:].unique())
## [0 1 7]
print(df['B'].unique())
## [8 0 9 2 7]

isin

DataFrame의 행이나 열에 (즉, Series안에) 어떠한 성분들이 들어 있는지 없는지 True/False로 구성된 Series로 배출한다.

print(df['A'].isin([1]))
## 2020-09-13    False
## 2020-09-14    False
## 2020-09-15    False
## 2020-09-16    False
## 2020-09-17    False
## 2020-09-18     True
## Freq: D, Name: A, dtype: bool

3. value_counts

DataFrame의 행이나 열을 뽑아내 (즉, Series로 뽑아진것) 성분들의 개수를 Series를 배출한다.

print(df.iloc[1].value_counts())
## 0    2
## 7    1
## 1    1
## Name: 2020-09-14 00:00:00, dtype: int64