# Configuration

In [1]:
DATASET_2016_10a_FILENAME = "RML2016.10a_dict.pkl"

# Imports

In [2]:
import pickle
import pandas as pd
import h5py

# Dataset

Defining **signals** as a list of I/Q couples:

`signals = [ [[I1],[Q1]], [[I2][Q2]], ..., [[IN], [QN]] ]`

Dataset is a dictionary indexed by modulation label and SNR value, and a list of **signal** samples:

`{ (modulation_label1, SNR1): signals1, ..., (modulation_labelN, SNRN): signalsN }`

Example of an element of the dictionary:

`(b'BPSK', -18): [ [[0.1, 0.2, 0.3, ...],[0.1, 0.2, 0.3, ...]], [[0.2, 0.3, 0.4, ...][0.2, 0.3, 0.4, ...]], ... ]`

So it's possible to unpack all the signals, and produce a series of tuples like this one.

`(modulation_label, SNR, signal)`

where **signal** is `signal = [[I1],[Q1]]`

Example of an element of the final dataset:

`'BPSK', -18, [[0.1, 0.2, 0.3, ...],[0.1, 0.2, 0.3, ...]])`

In [3]:
dataset_filename = DATASET_2016_10a_FILENAME

dataset = []

with (open(dataset_filename, "rb")) as dataset_file:
    data = dataset_file.read()
    data_dict = pickle.loads(data, encoding='bytes') # unpickle data
    keys = data_dict.keys()
    
    # for each key in dataset keys
    for key in keys:
        # extract modulation label and snr
        modulation, snr = key[0].decode("utf-8") , key[1]
        
        # for each I/Q signal couple sample
        for signal in data_dict[key]:
            # save the tuple (signal, modulation_label, snr) in the list
            tuple = (signal, modulation, snr)
            dataset.append(tuple)

Creating the pandas DataFrame

In [4]:
dataset_df = pd.DataFrame(data=dataset)

# pandas aesthetics

signal_column_dataframe_name = 'Signal'
modulation_label_column_dataframe_name = 'Modulation_Label'
snr_column_dataframe_name = 'SNR'

dataset_df.columns = [signal_column_dataframe_name, modulation_label_column_dataframe_name, snr_column_dataframe_name]

Final dataset form in DataFrame

In [5]:
dataset_df

Unnamed: 0,Signal,Modulation_Label,SNR
0,"[[-0.0059014712, -0.0023458179, -0.00074506126...",QPSK,2
1,"[[0.0050326153, 0.00094379985, -0.0018932355, ...",QPSK,2
2,"[[0.0052390713, 0.0073890695, 0.007276459, 0.0...",QPSK,2
3,"[[-0.0019859935, -0.0071501383, -0.00527185, -...",QPSK,2
4,"[[0.006674405, 0.0028359746, 0.005630027, 0.00...",QPSK,2
...,...,...,...
219995,"[[0.0062732296, -0.0050519477, 0.006672171, 0....",BPSK,-18
219996,"[[-0.003903181, -0.0015884301, -0.00633375, 2....",BPSK,-18
219997,"[[-0.0105958255, 0.005601244, -0.012161784, 0....",BPSK,-18
219998,"[[-0.002136606, 0.00995837, 0.0059440527, -0.0...",BPSK,-18


## Analysis

### Simple rows count

Dataset has 220.000 rows

In [6]:
dataset_df.count()

Signal              220000
Modulation_Label    220000
SNR                 220000
dtype: int64

### Values balance between modulation labels

Dataset is perfectly balanced between classes (same quantity of samples for each label).

In [7]:
dataset_df.groupby([modulation_label_column_dataframe_name]).count()

Unnamed: 0_level_0,Signal,SNR
Modulation_Label,Unnamed: 1_level_1,Unnamed: 2_level_1
8PSK,20000,20000
AM-DSB,20000,20000
AM-SSB,20000,20000
BPSK,20000,20000
CPFSK,20000,20000
GFSK,20000,20000
PAM4,20000,20000
QAM16,20000,20000
QAM64,20000,20000
QPSK,20000,20000


### Values balance between SNR

Values are perfectly balanced between SNR.

In [8]:
dataset_df.groupby([snr_column_dataframe_name]).count()

Unnamed: 0_level_0,Signal,Modulation_Label
SNR,Unnamed: 1_level_1,Unnamed: 2_level_1
-20,11000,11000
-18,11000,11000
-16,11000,11000
-14,11000,11000
-12,11000,11000
-10,11000,11000
-8,11000,11000
-6,11000,11000
-4,11000,11000
-2,11000,11000


### Values balance between modulation labels and SNR

Dataset is perfectly balanced between SNR and classes: for each label we have an equal number of samples for each SNR value.

In [9]:
# pd.set_option('display.max_rows', None)
dataset_df.groupby([modulation_label_column_dataframe_name, snr_column_dataframe_name]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Signal
Modulation_Label,SNR,Unnamed: 2_level_1
8PSK,-20,1000
8PSK,-18,1000
8PSK,-16,1000
8PSK,-14,1000
8PSK,-12,1000
...,...,...
WBFM,10,1000
WBFM,12,1000
WBFM,14,1000
WBFM,16,1000


# Signals

How to plot signals? From [here](https://pysdr.org/content/sampling.html#quadrature-sampling):

$$ x(t) = Icos(2\pi ft) + Qsin(2\pi ft) $$

Taking the first signal, which has QPSK modulation, and SNR = 2

In [10]:
signal = dataset_df[signal_column_dataframe_name][0]
# signal

Distinct I and Q parts of the signal.

In [11]:
I, Q = signal[0], signal[1]

In [12]:
I

array([-5.90147125e-03, -2.34581786e-03, -7.45061261e-04, -5.34572452e-03,
       -5.78941777e-03, -3.69683490e-03, -4.97868750e-03, -6.56572822e-03,
       -9.04932246e-03, -4.83668642e-03, -1.00837136e-02, -4.53815702e-03,
       -4.31498839e-03, -5.13423281e-03, -6.07567281e-03,  1.18665886e-03,
       -4.65670088e-03, -6.95332745e-03, -6.66823424e-03, -6.43977243e-03,
       -3.82532272e-03, -8.38821847e-03, -1.01344110e-02, -6.90073194e-03,
       -9.62839276e-03, -1.55354582e-03, -2.88469438e-03, -4.51788818e-03,
        3.41027649e-03,  7.41052255e-03,  3.35769332e-03,  7.62627879e-03,
        8.82679410e-03,  3.42824613e-03,  1.84083998e-03,  6.41621463e-03,
       -1.63305740e-04, -2.24135863e-03, -5.19226259e-03, -3.63920978e-03,
       -1.01316329e-02, -6.39987178e-03, -6.06458448e-03, -7.66557641e-03,
       -3.44835571e-03,  4.42530581e-04,  2.56719789e-03,  4.74519981e-03,
        4.66336496e-03,  6.47741836e-03,  8.53952859e-03,  4.98457067e-03,
        1.83550685e-04,  

In [13]:
Q

array([-0.00779554, -0.00781637, -0.00401967, -0.00511351, -0.00593952,
       -0.0065699 , -0.00558479, -0.00529769,  0.00021024, -0.00604725,
       -0.00705299, -0.00768376, -0.00682943, -0.00526323, -0.00428441,
       -0.00823529, -0.00887949, -0.00665625, -0.00873264, -0.00415313,
       -0.00815829, -0.00602711, -0.01298266, -0.00686788, -0.00674923,
       -0.00403722, -0.00778409, -0.00531385,  0.00321187, -0.00500479,
        0.00121511,  0.00072439,  0.00443489,  0.0083125 ,  0.00883208,
        0.0059255 ,  0.00833821,  0.00718797,  0.00816119,  0.00870452,
        0.00650418,  0.00439436,  0.00282486,  0.00216367,  0.00520329,
        0.00740604,  0.00053031,  0.00502639,  0.00479635,  0.00892057,
        0.00727959,  0.00410889, -0.00164091,  0.00032166, -0.00435043,
       -0.00534027, -0.00672173, -0.00410643, -0.00531335, -0.00456619,
       -0.00476122, -0.00262099,  0.00264574,  0.00791668,  0.00810155,
        0.00856092,  0.00586885,  0.0090829 ,  0.00278104, -0.00