[파이썬] statsmodels에서 OLS (최소제곱법)

Statsmodels is a powerful Python library that provides a wide range of statistical models and tools for analyzing data. One of the most commonly used methods in statsmodels is Ordinary Least Squares (OLS), which is a technique used to estimate the parameters in a linear regression model. In this blog post, we will explore how to perform OLS using statsmodels in Python.

Installation

To get started with statsmodels, you need to install it. Open your terminal and use the following command to install statsmodels using pip:

pip install statsmodels

Importing the necessary modules

Once statsmodels is installed, you can start by importing the required modules in your Python script or Jupyter Notebook:

import numpy as np
import statsmodels.api as sm

The numpy module is used for numerical operations and array manipulation, while statsmodels.api provides the OLS functionality we need.

Data preparation

Before performing OLS, we need to prepare the data. Let’s assume we have a dataset with two variables: X and y. X represents the independent variable(s), and y represents the dependent variable. We’ll use numpy to create some dummy data for demonstration purposes:

X = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

In this case, X and y have a linear relationship, and the goal of OLS is to estimate the coefficients that minimize the sum of squared residuals between the observed and predicted values.

Performing OLS

To perform OLS using statsmodels, we need to add a constant term to the X matrix. This can be done using the sm.add_constant() function:

X = sm.add_constant(X)

Next, we can initialize the OLS model and fit it to our data:

model = sm.OLS(y, X)
result = model.fit()

The model.fit() method estimates the model parameters and returns a RegressionResults object, which contains various statistics and information about the fitted model.

Interpreting the results

Once the OLS model is fitted, we can access various statistics and information about the model. For example, we can use the result.summary() method to print a summary of the model:

print(result.summary())

The summary provides information such as the R-squared value, coefficients, standard errors, t-statistics, and p-values for each variable in the model. These statistics can help us assess the significance and quality of the regression model.

Conclusion

In this blog post, we learned how to perform Ordinary Least Squares (OLS) using statsmodels in Python. OLS is a fundamental technique for estimating linear regression models and understanding the relationship between variables. By using statsmodels, we can easily implement OLS and access various statistics and information about the fitted model.