[파이썬] scipy 가설 검정

In the field of statistics, hypothesis testing is a fundamental technique used to make inferences about a population based on a sample. It involves formulating a null and alternative hypothesis, collecting data, and performing statistical tests to determine the probability of observing the data under the null hypothesis.

In this blog post, we will explore how to perform hypothesis testing using the scipy library in Python. We will cover the steps involved in conducting hypothesis tests and provide examples using different statistical tests available in scipy.

Steps in Hypothesis Testing

The general steps involved in hypothesis testing include:

  1. Formulating the hypotheses: Start by defining the null hypothesis (H0) and the alternative hypothesis (H1). The null hypothesis typically represents the status quo or no difference/relationship, while the alternative hypothesis represents the claim or assertion we want to test.

  2. Choosing the significance level: The significance level (usually denoted as α) determines the probability of rejecting the null hypothesis when it is actually true. Commonly used values for α are 0.05 and 0.01.

  3. Collecting and analyzing data: Collect a sample from the population of interest and calculate relevant statistics. These statistics will be used to determine whether to reject or fail to reject the null hypothesis.

  4. Performing the statistical test: Based on the sample data and the chosen test, calculate the test statistic and determine the p-value. The p-value represents the probability of obtaining the observed sample results under the assumption that the null hypothesis is true.

  5. Drawing conclusions: Based on the p-value and the significance level, make a decision about the null hypothesis. If the p-value is less than α, the null hypothesis is rejected in favor of the alternative hypothesis. Otherwise, the null hypothesis is not rejected.

Examples using scipy

Now, let’s dive into some examples of hypothesis testing using scipy.

One-sample t-test

Suppose we have a sample of exam scores (group A) and we want to test whether the mean score is significantly different from a given value. We can use the one-sample t-test from scipy.stats to perform this test.

import scipy.stats as stats

# Sample data
scores = [78, 85, 89, 92, 81, 95, 88, 75, 80, 86]

# Perform one-sample t-test
test_result = stats.ttest_1samp(scores, 80)

print("t-statistic:", test_result.statistic)
print("p-value:", test_result.pvalue)

Two-sample t-test

Consider a scenario where we have two groups of students (group A and group B) and we want to compare their mean exam scores. We can use the two-sample t-test to determine if there is a significant difference between the means of the two groups.

import scipy.stats as stats

# Sample data
group_a_scores = [78, 85, 89, 92, 81]
group_b_scores = [85, 90, 92, 80, 88]

# Perform two-sample t-test
test_result = stats.ttest_ind(group_a_scores, group_b_scores)

print("t-statistic:", test_result.statistic)
print("p-value:", test_result.pvalue)

Chi-square test

Suppose we have a dataset containing categorical variables and we want to test whether there is a significant association between the variables. We can use the chi-square test from scipy.stats to perform this test.

import scipy.stats as stats
import numpy as np

# Contingency table
observed = np.array([[10, 20, 30],
                     [40, 50, 60],
                     [70, 80, 90]])

# Perform chi-square test
chi2_stat, p_val, _, _ = stats.chi2_contingency(observed)

print("Chi-square statistic:", chi2_stat)
print("p-value:", p_val)

Conclusion

scipy provides a wide range of statistical functions for conducting hypothesis tests in Python. In this blog post, we explored some examples of hypothesis testing using scipy.stats. By following the steps of hypothesis testing and using the appropriate test from scipy, we can make inferences about populations based on sample data.