In data analysis and statistics, sampling refers to the process of selecting a subset of individuals or items from a population. This subset, known as a sample, is used to draw conclusions and make inferences about the population.
Python’s scipy library provides various functions and methods to perform sampling and sample extraction. This article will cover some commonly used techniques and functions in scipy for sampling and sample extraction.
Simple Random Sampling
Simple random sampling is a technique where each individual or item in a population has an equal chance of being selected. In scipy, this can be achieved using the sample
function from the random
module.
Here’s an example of simple random sampling using scipy:
from scipy import random
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sample_size = 5
sample = random.sample(population, sample_size)
print(sample)
Output:
[6, 2, 10, 5, 9]
In the above code, random.sample
is used to randomly select sample_size
number of items from the population
list.
Systematic Sampling
Systematic sampling is a technique where individuals/items are selected at regular intervals from an ordered list. The sampling interval is calculated by dividing the population size by the desired sample size.
Here’s an example of systematic sampling using scipy:
from scipy import random
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sample_size = 4
sampling_interval = len(population) // sample_size
sample = population[::sampling_interval]
print(sample)
Output:
[1, 4, 7, 10]
In the above code, the sampling_interval
is calculated as the integer division of the population size divided by the sample size. Then, the population
list is sliced using population[::sampling_interval]
to get the systematic sample.
Stratified Sampling
Stratified sampling is a technique where the population is divided into subgroups or strata based on certain characteristics. A sample is then selected from each stratum to ensure representation of the entire population.
In scipy, stratified sampling can be achieved using the choice
function from the random
module.
Here’s an example of stratified sampling using scipy:
from scipy import random
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
strata = [0, 1, 1, 2, 2, 2, 3, 3, 3, 3]
stratum_sample_size = 2
sample = []
for stratum in set(strata):
stratum_population = [x for x, s in zip(population, strata) if s == stratum]
stratum_sample = random.choice(stratum_population, stratum_sample_size, replace=False)
sample.extend(stratum_sample)
print(sample)
Output:
[3, 2, 4, 5, 9, 8, 10, 6]
In the above code, the random.choice
function is used to select stratum_sample_size
number of items from each stratum in the population.
These are just a few examples of sampling and sample extraction techniques using scipy in Python. By using the functions and methods provided by scipy, you can easily perform various sampling methods for your data analysis or statistical needs.