[파이썬] statsmodels 로그 가능도

In statistical modeling, the log-likelihood function plays a crucial role. It measures the probability of observing a given set of data values, given a specific statistical model. By maximizing the log-likelihood function, we can estimate the parameters of the model that best fit the observed data.

In Python, the statsmodels library provides convenient functions to calculate and maximize the log-likelihood of different statistical models. Let’s explore some examples to understand how to use the statsmodels library for log-likelihood estimation.

Example 1: Linear Regression

First, let’s consider a simple linear regression model. We have a target variable y and a set of independent variables X to predict y. We assume that y follows a normal distribution with mean X * beta and constant variance.

import numpy as np
import statsmodels.api as sm

# Simulated dataset
np.random.seed(42)
X = np.random.randn(100, 2)
X = sm.add_constant(X)  # Add constant term
beta = np.array([1, 2, 3])
epsilon = np.random.randn(100)
y = np.dot(X, beta) + epsilon

# Fit linear regression model
model = sm.OLS(y, X)
results = model.fit()

# Calculate log-likelihood
log_likelihood = results.llf
print(f"Log-likelihood: {log_likelihood:.2f}")

In the above example, we first create a simulated dataset with a constant term and two independent variables. Then, we fit an ordinary least squares (OLS) regression model to the data using statsmodels.OLS. Finally, we obtain the log-likelihood of the fitted model using the llf attribute of the results object.

Example 2: Logistic Regression

Next, let’s consider a binary logistic regression model. We have an outcome variable y that follows a Bernoulli distribution, and a set of independent variables X to predict the probability of y = 1.

import pandas as pd
import statsmodels.formula.api as smf

# Simulated dataset
np.random.seed(42)
df = pd.DataFrame({'x1': np.random.randn(100),
                   'x2': np.random.randn(100),
                   'y': np.random.binomial(1, 0.5, 100)})

# Fit logistic regression model
model = smf.logit(formula='y ~ x1 + x2', data=df)
results = model.fit()

# Calculate log-likelihood
log_likelihood = results.llf
print(f"Log-likelihood: {log_likelihood:.2f}")

In this example, we create a simulated dataset with two independent variables x1 and x2, and a binary outcome variable y. We fit a logistic regression model to the data using statsmodels.logit. Finally, we obtain the log-likelihood of the fitted model using the llf attribute of the results object.

In conclusion, the statsmodels library provides powerful tools for calculating and maximizing the log-likelihood function in Python. By leveraging these tools, you can estimate the parameters of various statistical models and make meaningful inferences from your data.