[파이썬] statsmodels 민감도 분석

Sensitivity analysis is a technique used in statistics to understand how the variation in one variable affects the outcome of another variable. In the field of data analysis, sensitivity analysis can help identify the most influential variables and their impact on the model. In this blog post, we will explore how to perform sensitivity analysis using the statsmodels library in Python.

Installation

Before we begin, let’s make sure we have the statsmodels library installed. Use the following command to install it:

pip install statsmodels

Data Preparation

To demonstrate the sensitivity analysis, we need a dataset. For this example, let’s use the “Auto” dataset from the statsmodels library. The dataset contains information about various car models, including their MPG (miles per gallon) and other variables like horsepower, weight, and cylinders.

First, we need to import the required libraries and load the dataset:

import statsmodels.api as sm
import pandas as pd

# Load the Auto dataset
data = sm.datasets.get_rdataset('mtcars').data

Sensitivity Analysis using OLS Regression

To perform sensitivity analysis, we will start by fitting an Ordinary Least Squares (OLS) regression model using the sm.OLS function. This will allow us to understand the relationship between the predictor variables (independent variables) and the target variable (dependent variable).

Here’s an example of fitting an OLS regression model to predict MPG using variables like horsepower, weight, cylinders, and others:

# Define the dependent variable and independent variables
dependent_variable = 'mpg'
independent_variables = ['horsepower', 'weight', 'cylinders']

# Fit the OLS regression model
model = sm.OLS(data[dependent_variable], sm.add_constant(data[independent_variables]))
results = model.fit()

# Print the regression results
print(results.summary())

The above code will provide a summary of the regression results, including the coefficients, p-values, and standard errors of the variables. These results help us understand the relationship between the predictor variables and the target variable.

Sensitivity Analysis using Confidence Intervals

Another approach to perform sensitivity analysis is by using confidence intervals. Confidence intervals provide a range of values within which the true population parameter is likely to fall. By examining the confidence intervals, we can assess the sensitivity of the model to changes in the independent variables.

To calculate the confidence intervals, we can use the conf_int method of the regression results:

# Calculate the confidence intervals for the coefficients
confidence_intervals = results.conf_int()

# Print the confidence intervals
print(confidence_intervals)

The conf_int method returns a DataFrame containing the lower and upper bounds of the confidence intervals for each variable. By comparing the range of the confidence intervals, we can identify the variables that have the most significant impact on the model’s predictions.

Conclusion

Sensitivity analysis plays a crucial role in understanding the impact of variables on the outcome of a statistical model. In this blog post, we explored how to perform sensitivity analysis using the statsmodels library in Python. We demonstrated fitting an OLS regression model and calculating the confidence intervals to assess the sensitivity of the model to different variables.

Remember, sensitivity analysis helps us identify the most influential variables in a model and provides insights into their impact on the predicted outcomes. It is an essential step in analyzing and interpreting statistical models.

Thanks for reading! Feel free to leave a comment or ask any questions you may have.