[파이썬] scipy 샘플링 및 표본 추출

In data analysis and statistics, sampling refers to the process of selecting a subset of individuals or items from a population. This subset, known as a sample, is used to draw conclusions and make inferences about the population.

Python’s scipy library provides various functions and methods to perform sampling and sample extraction. This article will cover some commonly used techniques and functions in scipy for sampling and sample extraction.

Simple Random Sampling

Simple random sampling is a technique where each individual or item in a population has an equal chance of being selected. In scipy, this can be achieved using the sample function from the random module.

Here’s an example of simple random sampling using scipy:

from scipy import random

population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sample_size = 5

sample = random.sample(population, sample_size)
print(sample)

Output:

[6, 2, 10, 5, 9]

In the above code, random.sample is used to randomly select sample_size number of items from the population list.

Systematic Sampling

Systematic sampling is a technique where individuals/items are selected at regular intervals from an ordered list. The sampling interval is calculated by dividing the population size by the desired sample size.

Here’s an example of systematic sampling using scipy:

from scipy import random

population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sample_size = 4

sampling_interval = len(population) // sample_size

sample = population[::sampling_interval]
print(sample)

Output:

[1, 4, 7, 10]

In the above code, the sampling_interval is calculated as the integer division of the population size divided by the sample size. Then, the population list is sliced using population[::sampling_interval] to get the systematic sample.

Stratified Sampling

Stratified sampling is a technique where the population is divided into subgroups or strata based on certain characteristics. A sample is then selected from each stratum to ensure representation of the entire population.

In scipy, stratified sampling can be achieved using the choice function from the random module.

Here’s an example of stratified sampling using scipy:

from scipy import random

population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
strata = [0, 1, 1, 2, 2, 2, 3, 3, 3, 3]

stratum_sample_size = 2

sample = []
for stratum in set(strata):
    stratum_population = [x for x, s in zip(population, strata) if s == stratum]
    stratum_sample = random.choice(stratum_population, stratum_sample_size, replace=False)
    sample.extend(stratum_sample)

print(sample)

Output:

[3, 2, 4, 5, 9, 8, 10, 6]

In the above code, the random.choice function is used to select stratum_sample_size number of items from each stratum in the population.

These are just a few examples of sampling and sample extraction techniques using scipy in Python. By using the functions and methods provided by scipy, you can easily perform various sampling methods for your data analysis or statistical needs.