[파이썬] statsmodels 기초 통계 함수

Statsmodels is a powerful library in Python that provides a wide range of statistical models and functions for data analysis and modeling. In this blog post, we will explore some of the basic statistical functions offered by Statsmodels. These functions can be used to perform various statistical calculations, such as descriptive statistics, hypothesis testing, and linear regression.

Before we dive into the functions, make sure you have installed Statsmodels using the following command:

pip install statsmodels

Descriptive Statistics

Statsmodels provides several functions to calculate descriptive statistics of a dataset. Let’s consider a simple example using a randomly generated dataset:

import numpy as np
import statsmodels.api as sm

# Generate a random dataset
np.random.seed(42)
data = np.random.randn(100)

# Calculate mean
mean = sm.mean(data)
print("Mean:", mean)

# Calculate standard deviation
std_dev = sm.std(data)
print("Standard Deviation:", std_dev)

# Calculate variance
variance = sm.var(data)
print("Variance:", variance)

In the code above, we import the statsmodels.api module as sm and generate a random dataset using NumPy. The mean(), std() and var() functions from Statsmodels are used to calculate the mean, standard deviation, and variance of the dataset, respectively.

Hypothesis Testing

Statsmodels also provides various functions to perform hypothesis testing. Let’s consider an example where we want to test whether the mean of a dataset is significantly different from a given value:

import numpy as np
import statsmodels.api as sm

# Generate a random dataset
np.random.seed(42)
data = np.random.randn(100)

# Perform one-sample t-test
t_statistic, p_value = sm.stats.ttest_1samp(data, popmean=0)
print("T-statistic:", t_statistic)
print("P-value:", p_value)

In the code above, we use the ttest_1samp() function from Statsmodels to perform a one-sample t-test. The function returns the t-statistic and p-value. You can specify the population mean using the popmean parameter.

Linear Regression

Statsmodels provides comprehensive support for linear regression models. Let’s consider a simple example of performing a simple linear regression on a dataset:

import numpy as np
import statsmodels.api as sm

# Generate random dataset
np.random.seed(42)
X = np.random.randn(100, 1)
y = 2 * X + np.random.randn(100, 1) * 0.5

# Add constant to X matrix
X = sm.add_constant(X)

# Fit the linear regression model
model = sm.OLS(y, X)
results = model.fit()

# Print regression summary
print(results.summary())

In the code above, we use the OLS() function from Statsmodels to specify and fit the linear regression model. The add_constant() function is used to add a constant term to the feature matrix X.

After fitting the model, the summary() method provides a detailed summary of the regression results, including coefficients, p-values, and goodness-of-fit measures.

Conclusion

Statsmodels provides a rich set of functions for performing basic statistical calculations in Python. In this blog post, we explored some of the fundamental statistical functions offered by Statsmodels, including descriptive statistics, hypothesis testing, and linear regression. These functions can serve as a foundation for in-depth statistical analysis and modeling in various domains.

Keep in mind that this is just a brief introduction to statsmodels, and there are many more advanced statistical techniques and models available in the library. Make sure to refer to the official documentation and explore further to leverage the full potential of statsmodels for your statistical analysis tasks.