# 2 Sampling

In this exercise, you should revisit the sampling techniques we discussed du
ing the lecture, namely simple random sampling, clustered sampling, and
stratified sampling. To get started, load the dataset SN list large.csvs-
ing pandas. The data frame contains date of discovery, magnitude (-log brht-
ness), position, and tSNype. We are not interested in the positions at al , so
remove this first. Also, we are only interested in type Ia, II, and IIn. entu-
ally, we want to sample magnitudes from supernovae. For this, we want to use
random.choices() from the Python random library.

In [45]:
import pandas as pd

df = pd.DataFrame(pd.read_csv("SN_list_large.csv", header=0))
df = df.drop(columns=['SN Position'])
types = ['Ia', 'II', 'IIn']
df = df[df['Type'].isin(types)]
df['Date'] = pd.to_datetime(df['Date'], format='%Y %m %d')
df.head(10)

Unnamed: 0,Date,Mag.,Type
0,2015-02-07,19.1,IIn
1,2015-12-16,17.8,Ia
2,2015-12-12,17.3,IIn
4,2015-12-07,15.9,Ia
5,2015-12-02,18.1,II
7,2015-11-28,16.7,II
8,2015-11-30,17.6,Ia
9,2015-10-03,16.8,II
10,2015-10-30,17.2,Ia
11,2015-07-12,15.9,Ia


In [34]:
df.tail(3)

Unnamed: 0,Date,Mag.,Type
4803,2000 01 22,16.2,II
4805,2000 01 11,16.5,Ia
4806,2000 01 01,16.5,Ia


## 2.1 Simple random sampling

Select all SN of relevant type, put their magnitudes in a list, and take 100
random samples from it.

In [41]:
import random as rd

magnitudes = df['Mag.'].tolist()
sample = rd.choices(magnitudes, k=100)
sample

[19.2,
 18.5,
 16.9,
 18.3,
 15.7,
 17.4,
 17.0,
 15.8,
 16.5,
 17.5,
 14.8,
 18.0,
 16.1,
 17.2,
 19.7,
 22.7,
 18.0,
 16.4,
 19.1,
 16.8,
 17.8,
 22.0,
 19.0,
 20.9,
 19.5,
 22.7,
 17.0,
 17.0,
 17.4,
 20.1,
 14.5,
 19.5,
 17.2,
 20.5,
 17.0,
 16.0,
 23.0,
 22.2,
 14.8,
 21.5,
 23.2,
 17.5,
 22.4,
 14.5,
 15.3,
 22.1,
 19.0,
 17.3,
 22.0,
 18.1,
 22.0,
 17.5,
 23.3,
 18.8,
 15.9,
 15.4,
 22.5,
 21.3,
 15.9,
 18.4,
 15.6,
 21.4,
 16.7,
 21.8,
 17.1,
 18.0,
 17.7,
 19.7,
 16.4,
 16.9,
 18.6,
 19.1,
 15.4,
 15.0,
 17.5,
 19.1,
 18.0,
 18.3,
 17.6,
 21.8,
 18.1,
 17.5,
 12.8,
 21.9,
 17.5,
 17.5,
 21.6,
 20.6,
 15.1,
 17.4,
 17.3,
 16.5,
 19.5,
 18.0,
 16.7,
 19.4,
 12.2,
 20.7,
 22.8,
 18.8]

## 2.2 Clustered sampling

We assume that the universe does not change its supernova rate, but we also
believe we don’t get better or worse in detecting them (which is actually wrong).
Nevertheless, do clustering in time over all relevant SN types, each cluster should
cover 2 years, create 5 clusters, and sample 20 SN from each cluster

In [59]:
start_year = 2000
end_year = 2009
cluster_span = 2
clusters = {}

for cluster_start in range(start_year, end_year+1, cluster_span):
    cluster_df = df[(df['Date'].dt.year >= start_year) & (df['Date'].dt.year < start_year + 2)]
    magnitudes = cluster_df['Mag.'].tolist()
    sample = rd.choices(magnitudes, k=20)
    clusters[f'{cluster_start}-{cluster_start+2}'] = sample

clusters

{'2000-2002': [15.9,
  15.3,
  22.5,
  16.5,
  17.8,
  16.5,
  17.1,
  17.2,
  21.1,
  24.8,
  19.5,
  18.1,
  16.8,
  18.5,
  17.4,
  24.5,
  17.1,
  21.0,
  22.4,
  16.6],
 '2002-2004': [17.2,
  22.4,
  16.6,
  17.0,
  17.1,
  16.5,
  18.3,
  22.0,
  22.7,
  15.5,
  16.5,
  16.3,
  18.0,
  20.5,
  17.4,
  18.8,
  17.4,
  17.2,
  16.8,
  17.5],
 '2004-2006': [15.9,
  17.8,
  15.0,
  14.8,
  18.3,
  16.5,
  17.5,
  15.5,
  17.5,
  24.1,
  16.5,
  18.3,
  19.2,
  23.5,
  17.2,
  15.9,
  17.8,
  14.9,
  17.5,
  23.5],
 '2006-2008': [17.1,
  16.9,
  22.4,
  17.8,
  20.0,
  16.2,
  20.0,
  15.3,
  19.2,
  17.4,
  17.2,
  18.5,
  17.4,
  19.7,
  17.8,
  17.8,
  17.5,
  18.2,
  22.6,
  18.0],
 '2008-2010': [24.3,
  18.0,
  24.3,
  21.0,
  16.2,
  20.0,
  23.7,
  15.5,
  20.4,
  16.5,
  14.5,
  14.7,
  15.5,
  18.2,
  17.3,
  17.2,
  14.8,
  16.1,
  24.5,
  20.4]}

## 2.3 Stratified sampling

We believe that there is a difference in the brightness distribution among dif-
ferent SN types. Therefore, create a stratum for each SN type (Ia, II, IIn) and
sample 33 from each stratum.

In [7]:
stratum_samples = {}
for sn_type in ['Ia', 'II', 'IIn']:
    stratum = df[(df['Type'] == sn_type)]
    stratum = stratum['Mag.'].tolist()
    sample = rd.choices(stratum, k=33)
    stratum_samples[sn_type] = sample

stratum_samples
    

{'Ia': [17.2,
  16.0,
  17.7,
  21.3,
  15.1,
  21.5,
  15.8,
  22.4,
  17.8,
  17.1,
  16.3,
  17.0,
  22.7,
  22.5,
  16.1,
  20.4,
  19.6,
  15.8,
  16.6,
  21.9,
  20.8,
  18.6,
  22.9,
  22.0,
  22.2,
  22.5,
  23.0,
  18.2,
  21.8,
  18.2,
  20.7,
  18.6,
  18.7],
 'II': [19.2,
  16.7,
  18.5,
  21.9,
  17.4,
  18.8,
  17.8,
  16.9,
  18.2,
  16.0,
  18.7,
  18.1,
  17.7,
  17.3,
  20.2,
  18.1,
  18.9,
  17.0,
  18.6,
  18.2,
  21.6,
  18.3,
  17.8,
  17.5,
  19.0,
  18.0,
  18.1,
  17.7,
  18.2,
  16.8,
  19.7,
  16.0,
  17.7],
 'IIn': [17.3,
  18.0,
  18.2,
  13.5,
  15.8,
  17.5,
  18.2,
  17.6,
  16.7,
  17.0,
  17.6,
  17.8,
  18.0,
  17.8,
  18.2,
  18.5,
  18.8,
  18.1,
  18.8,
  17.6,
  17.0,
  19.1,
  18.8,
  17.5,
  16.5,
  20.5,
  15.7,
  16.3,
  18.2,
  16.7,
  17.0,
  16.4,
  17.6]}