
# Jupyter notebook for the case study (using Python 3)

## Task 1

**_1) Setup_**

importing libraries to process data. Pandas package to work with Dataframes. Numpay package for math / linear algebra.

In [3]:
import pandas as pd
import numpy as np

defining dataset names. Can change names to add other datasets.

In [4]:
#uncomment for testing with small datasets
name_dataset_0 = 'small_app_dataset.csv' # 'app_dataset.csv'
name_dataset_1 = 'small_dataset_1.csv' # 'dataset_1.csv'
name_dataset_2 = 'small_dataset_2.csv' # 'dataset_2.csv'

In [5]:
name_dataset_0 = 'app_dataset.csv'
name_dataset_1 = 'dataset_1.csv'
name_dataset_2 = 'dataset_2.csv'

defining key names

In [6]:
key1 = 'key1'
key2 = 'key2'
key_names = [key1, key2]

saving CSV fomratted datasets as Pandas dataframes

In [7]:
dataset_0 = pd.read_csv(name_dataset_0, sep=';')
dataset_1 = pd.read_csv(name_dataset_1, sep=';')
dataset_2 = pd.read_csv(name_dataset_2, sep=';')

**_2) Investigating the datasets - checking how many rows, columns and elements they have_**

function to print the number of columns, rows and elements for each dataset

In [8]:
def print_col_row_and_cell_count(df):
    row_count, column_count = df.shape
    element_count = column_count*row_count
    print('column count:  ', column_count)
    print('row count:     ', row_count)
    print('element count: ', element_count)
    print()

total number of row and column count for each dataset (including NA values)

In [9]:
print('1) dataset 0')
print_col_row_and_cell_count(dataset_0)
print('2) dataset 1')
print_col_row_and_cell_count(dataset_1)
print('3) dataset 2')
print_col_row_and_cell_count(dataset_2)

1) dataset 0
column count:   5
row count:      798
element count:  3990

2) dataset 1
column count:   169
row count:      14571
element count:  2462499

3) dataset 2
column count:   37
row count:      10137
element count:  375069



**_3) Joining the datasets_**

In [10]:
dataset_0_and_1 = pd.merge(dataset_0, dataset_1, how='left', on=key2)

In [11]:
dataset_full_not_cleaned = pd.merge(dataset_0_and_1, dataset_2, how='left', on=key1)

In [12]:
dataset_full_not_cleaned.to_csv('output_dataset_full_not_cleaned.csv')

In [13]:
print('dataset_full - before cleaning NAs')
print_col_row_and_cell_count(dataset_full_not_cleaned)

dataset_full - before cleaning NAs
column count:   209
row count:      798
element count:  166782



**_4) Removing NA containing columns and rows. Saving the final dateset to CSV file_**

Function to deal with NA values. It will drop rows and columns if the amount of non-NA values in a given column or row is below a given threshold. By default it is 20% for columns and 5% for rows.

In [14]:
def drop_rows_and_cols_with_NA_below_thresholds(input_df, key_names=key_names, col_thresh=0.20, row_thresh=0.05):
    df = input_df.copy(deep=True)
    
    number_of_cols = len(list(df.columns))
    row_threshold_integer = round(row_thresh * number_of_cols)
    df = df.dropna(axis=0, thresh=row_threshold_integer) # droping rows that have non-NA cell count below threshold
    
    number_of_rows = len(df)
    col_threshold_integer = round(col_thresh * number_of_rows)
    df = df.dropna(axis=1, thresh=col_threshold_integer) # droping columns that have non-NA cell count below threshold
    return df

In [15]:
dataset_full_clean = drop_rows_and_cols_with_NA_below_thresholds(dataset_full_not_cleaned, col_thresh=0.20, row_thresh=0.05)

In [16]:
print('dataset_full_clean - after some columns and rows with many missing values are removed')
print_col_row_and_cell_count(dataset_full_clean)

dataset_full_clean - after some columns and rows with many missing values are removed
column count:   62
row count:      779
element count:  48298



Saving the final dataset as a CSV file

In [17]:
dataset_full_clean.to_csv('output_dataset_full_clean.csv')

**_5) Observations on data integrity _**

Overall, we see that a lot of data is not used. In the final table we have 798 rows (the same as in the 'master' dataset_0, because that dataset is used in left outer join). Dataset1 has 14571 rows, and dataset2 - 10137. Since response variable is available only for these 798 rows, we have to ignore most of the rows from dataset1 and dataset2. 

On top of that, there are a lot of missing values (NA), especially in the dataset1. The combined dataset has 209 columns, before the columns with many NAs are removed. After I remove them, applying 20% threshold, only 62 columns remain. [UPDATE - provide counts on NA in each table. Maybe update print function to show NA cells as well]

**_3) .....handling NA in some other way???...... _**

## Task 2

**_1) Setup_**

Defining the name of the target variable

In [18]:
target = 'response'

Defining function to get all column names except for the target and key columns. Will allow to dynamically analyze dataframes without the need to know exact columns they have

In [19]:
def get_col_names_without_target_and_keys(dataframe, key_names = key_names, target = target):
    all_column_names_set = set(dataframe)
    col_names_without_target_and_keys = all_column_names_set - set(key_names) - set([target])
    return list(col_names_without_target_and_keys)

We want to determine which factors are the most important in predicting target variable (response). Many variables still has too many NAs, so I will use more agressive column threshold (60%) to remove columns/factors with many missing values. Otherwise, we would introduce too much bias if we would try to impute them all.

In [20]:
dataset_full = drop_rows_and_cols_with_NA_below_thresholds(dataset_full_not_cleaned, col_thresh=0.60, row_thresh=0.05)
print_col_row_and_cell_count(dataset_full)

column count:   40
row count:      779
element count:  31160



**_2) Imputing remaining missing values_**

There are still many missing values. In order to use Machine Learning models in Task 2 and 3, it is required to get rid of missing values. In the previous I have removed some. The remaining will be imputed.

In [40]:
dataset_full_np = dataset_full.values

In [29]:
import Orange

In [41]:
dataset_full_orange = Orange.data.Table(dataset_full_np)

ValueError: could not convert string to float: 'U'

In [None]:
take only text columns 

impute missing values!!!

normalize all data

one hot encoder / create dummies for text information

train regularized regression 

dimensionality reduction

## Task 4

1) Deal with imbalaced dataset. Out of 798 observations, response variable is 0 in 645 observations, and it is 1 in 153 cases. It is not a very big disbalance, but it is possible that prediction accuracy would be better if I would deal with this imbalancing. (a) The simplest approach is to randomly remove 492 rows where response variable is 0, this would result in a balaced dataset where we have 153 cases of response variable being 0 and 153 casee being 1. (b) A bit better approach would be to put more weight on obseravations where response is 1. Each such observation would weigh 4.2 (645/153). (c) Employ some of the many other approaches of dealing with imbalanced dataset.

2) Columns v173, v175 and v177 contain some date information. It would be good to understand what these dates are about and then to extract some valuable features. It could be: duration, starting and end time in hours, days, months, etc. Such information could be helpful at making better predictions.


3) I am mainly removing columns with many NAs. For rows I was more conservative - I was removing only those that had all NA values except for key columns. It might be beneficial to apply a threshold and remove rows that has too many missing values (similarly as I did with columns).

4) Use better techniques for dimensionality reduction

5) Use SVM for sparse datasets

In [42]:
dataset_full_with_dummies = dataset_full.copy(deep=True)

In [None]:
dataset_full_with_dummies.ge

In [36]:
dataset_full_np2

array([[4, 15, 35, ..., 'No', 'business', 'U'],
       [88, 13, 21, ..., 'No', 'residential', 'U'],
       [139, 13, 66, ..., 'No', 'residential', 'U'],
       ..., 
       [13032, 19, 22, ..., 'No', 'wifi', 'U'],
       [13036, 19, 22, ..., 'No', 'wifi', 'U'],
       [13066, 23, 26, ..., 'No', 'wifi', 'U']], dtype=object)

In [37]:
len(dataset_full_np2)

779

In [38]:
dataset_full_np2.shape

(779, 40)

In [39]:
dataset_full_np2[0]

array([4, 15, 35, 4.0, 0, '1400', 1.0, '1400', 1.0, nan, nan,
       '1998-04-28 20:00:00', 6991.0, '2017-06-19 13:30:01', 0.0,
       '2017-06-19 13:30:00', 'Verified', '500 Moderate', 500.0, 2.0, 28.0,
       4.0, 'Moderate Fraud Risk', 3.0, 'Fraud Score 301 to 600', 1.0, 1.0,
       'Yes', 'Yes', 'Moderate', 3.0, 508.0, 3.0, 'Moderate', 311.0,
       'Moderate By Proxy Reputation And Country Code', 'Good', 'No',
       'business', 'U'], dtype=object)

In [33]:
dataset_full_np.shape

AttributeError: 'function' object has no attribute 'shape'

In [32]:
len(dataset_full_np)

TypeError: object of type 'method' has no len()

In [34]:
type(dataset_full_np)

method

In [27]:
dataset_full_np

<bound method NDFrame.as_matrix of       key1  v001  v002  key2  response        v4    v5       v14   v29  v120  \
0        4    15    35     4         0      1400   1.0      1400   1.0   NaN   
1       88    13    21    34         1       500   1.0    614,85   1.0   NaN   
2      139    13    66   808         0   4975,86   2.0   4975,86  27.0   NaN   
3      148    13    20    65         0       NaN   NaN       NaN   NaN   6.0   
4      159    13    21   312         1       NaN   NaN       NaN   NaN   2.0   
5      162    13    43  6218         0      2500   1.0   3225,38  37.0  10.0   
6      175    14    27   241         1  13632,02   6.0  11675,93  38.0   4.0   
7      215    14    29   107         1       748   1.0   1277,33   5.0   3.0   
8      246    14    33   102         0       NaN   NaN       NaN  28.0   NaN   
9      280    14    21   216         0       NaN   NaN       NaN   5.0   1.0   
10     330    15    35   183         0      3000   2.0   3298,14   2.0   2.0   
11   

In [28]:
dataset_full

Unnamed: 0,key1,v001,v002,key2,response,v4,v5,v14,v29,v120,...,v196,v197,v198,v199,v200,v201,v202,v203,v204,v172.1
0,4,15,35,4,0,1400,1.0,1400,1.0,,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,business,U
1,88,13,21,34,1,500,1.0,61485,1.0,,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,U
2,139,13,66,808,0,497586,2.0,497586,27.0,,...,4.0,510.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,U
3,148,13,20,65,0,,,,,6.0,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,cellular,U
4,159,13,21,312,1,,,,,2.0,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,cellular,U
5,162,13,43,6218,0,2500,1.0,322538,37.0,10.0,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,U
6,175,14,27,241,1,1363202,6.0,1167593,38.0,4.0,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,U
7,215,14,29,107,1,748,1.0,127733,5.0,3.0,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,U
8,246,14,33,102,0,,,,28.0,,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,U
9,280,14,21,216,0,,,,5.0,1.0,...,3.0,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,U


In [None]:
dataset_0_and_1

In [None]:
dataset_full

In [None]:
len(dataset_full)

In [None]:
dataset_0.shape[1]

In [None]:
dataset_example = dataset_1.copy(deep=True)
dataset_example.shape

In [None]:
#dataset_example = dataset_example.dropna(axis=0, how='all', subset=all_columns_no_key)

In [None]:
#dataset_example

In [None]:
list(dataset_0)

In [None]:
dataset_0

In [None]:
#set(dataset_1)

In [None]:
dataset_1

In [None]:
#list(dataset_2)

In [None]:
dataset_2

In [44]:
 df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']},  index=[0, 1, 2, 3])

In [45]:
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [48]:
pd.get_dummies(df1)

Unnamed: 0,A_A0,A_A1,A_A2,A_A3,B_B0,B_B1,B_B2,B_B3,C_C0,C_C1,C_C2,C_C3,D_D0,D_D1,D_D2,D_D3
0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0
1,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0
2,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0
3,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1


In [49]:
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [46]:
na_df = pd.DataFrame([[1, 2, 3, 4, 5], [3, 4, 5, 1, np.nan],
                    [6, 4, 5, np.nan, np.nan], [1, 2, np.nan, np.nan, np.nan], 
                    [1, 7, np.nan, np.nan, np.nan], [1, 7, np.nan, np.nan, np.nan],
                    [1, 7, np.nan, np.nan, np.nan], [1, 7, np.nan, np.nan, np.nan],
                    [1, 7, np.nan, np.nan, np.nan], [1, 7, np.nan, np.nan, np.nan]],
                    columns=['key1','A','B','C','D'])
na_df

Unnamed: 0,key1,A,B,C,D
0,1,2,3.0,4.0,5.0
1,3,4,5.0,1.0,
2,6,4,5.0,,
3,1,2,,,
4,1,7,,,
5,1,7,,,
6,1,7,,,
7,1,7,,,
8,1,7,,,
9,1,7,,,


In [47]:
na_df4 = na_df.copy(deep=True)
na_df4 = drop_rows_and_cols_with_NA_below_thresholds(na_df4, key_names=key_names, col_thresh=0.1, row_thresh=0.04)
na_df4

Unnamed: 0,key1,A,B,C,D
0,1,2,3.0,4.0,5.0
1,3,4,5.0,1.0,
2,6,4,5.0,,
3,1,2,,,
4,1,7,,,
5,1,7,,,
6,1,7,,,
7,1,7,,,
8,1,7,,,
9,1,7,,,


In [None]:
na_df4 = na_df.copy(deep=True)
na_df4 = na_df4.dropna(axis=1, thresh=1) # droping NA columns
na_df4

In [None]:
na_df2=na_df.copy(deep=True)
na_df2 = na_df2.dropna(axis=0, how='all',subset={'B','C','A'})
na_df2

In [None]:
na_df_columns = list(na_df.columns)
na_df_columns

In [None]:
na_df_columns_set=set(na_df_columns)
na_df_columns_set

In [None]:
list(na_df_columns_set)

In [None]:
set(na_df2)

In [None]:
na_df3=na_df.copy(deep=True)
drop_NA_only_columns_and_rows(na_df3)

In [None]:
[1,2,3] - [1,2]

In [None]:
set([1,2,3]) - set([1,2,4])

In [None]:
#old drop NA function v1
def drop_NA_only_columns_and_rows(input_df, key_names=key_names):
    df = input_df.copy(deep=True)
    df_columns = set(df)
    df_columns_without_keys = df_columns - set(key_names)
    df = df.dropna(axis=0, how='all', subset=df_columns_without_keys) # droping rows that have all NA values except for keys
    df = df.dropna(axis=1, how='all') # droping NA columns
    return df

In [None]:
#old drop NA function v2
def drop_rows_with_NA_only_and_cols_with_NA_below_threshold(input_df, key_names=key_names, threshold_percent=0.20):
    df = input_df.copy(deep=True)
    
    df_columns = set(df)
    df_columns_without_keys = df_columns - set(key_names)
    df = df.dropna(axis=0, how='all', subset=df_columns_without_keys) # droping rows that have all NA values except for keys
    
    number_of_rows = len(df)
    threshold_integer = round(threshold_percent * number_of_rows)
    df = df.dropna(axis=1, thresh=threshold_integer) # droping columns that have non-NA cell count is below threshold
    return df

In [61]:
dataset_full_clean

Unnamed: 0,key1,v001,v002,key2,response,v4,v5,v9,v12,v14,...,v197,v198,v199,v200,v201,v202,v203,v204,v171.1,v172.1
0,4,15,35,4,0,1400,1.0,,1.0,1400,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,business,01,U
1,88,13,21,34,1,500,1.0,,1.0,61485,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,01,U
2,139,13,66,808,0,497586,2.0,,,497586,...,510.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,01,U
3,148,13,20,65,0,,,,,,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,cellular,01,U
4,159,13,21,312,1,,,,,,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,cellular,01,U
5,162,13,43,6218,0,2500,1.0,,,322538,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,01,U
6,175,14,27,241,1,1363202,6.0,2.0,,1167593,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,01,U
7,215,14,29,107,1,748,1.0,,,127733,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,01,U
8,246,14,33,102,0,,,,,,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,17,U
9,280,14,21,216,0,,,,,,...,508.0,3.0,Moderate,311.0,Moderate By Proxy Reputation And Country Code,Good,No,residential,01,U


In [97]:
list(dataset_full_clean.columns)

['key1',
 'v001',
 'v002',
 'key2',
 'response',
 'v4',
 'v5',
 'v9',
 'v12',
 'v14',
 'v23',
 'v24',
 'v25',
 'v27',
 'v28',
 'v29',
 'v32',
 'v33',
 'v34',
 'v36',
 'v37',
 'v105',
 'v106',
 'v109',
 'v112',
 'v113',
 'v116',
 'v117',
 'v119',
 'v120',
 'v122',
 'v123',
 'v173',
 'v174',
 'v175',
 'v176',
 'v177',
 'v178',
 'v179',
 'v180',
 'v181',
 'v182',
 'v183',
 'v184',
 'v185',
 'v186',
 'v191',
 'v192',
 'v193',
 'v194',
 'v195',
 'v196',
 'v197',
 'v198',
 'v199',
 'v200',
 'v201',
 'v202',
 'v203',
 'v204',
 'v171.1',
 'v172.1']

In [78]:
col_names_no_key_no_target = get_col_names_without_target_and_keys(dataset_full_clean)

In [79]:
dataset_full_clean[col_names_no_key_no_target]

Unnamed: 0,v109,v9,v183,v001,v199,v12,v196,v175,v119,v176,...,v120,v27,v191,v29,v4,v193,v204,v200,v171.1,v181
0,,,4.0,15,Moderate,1.0,3.0,2017-06-19 13:30:01,,0.0,...,,1400,1.0,1.0,1400,Yes,business,311.0,01,2.0
1,,,4.0,13,Moderate,1.0,3.0,2017-06-21 11:35:53,,0.0,...,,500,1.0,1.0,500,Yes,residential,311.0,01,2.0
2,,,3.0,13,Moderate,,4.0,2017-06-22 14:13:01,12.0,0.0,...,,,1.0,27.0,497586,Yes,residential,311.0,01,2.0
3,1.0,,4.0,13,Moderate,,3.0,2017-06-06 12:12:50,,14.0,...,6.0,,1.0,,,Yes,cellular,311.0,01,2.0
4,,,4.0,13,Moderate,,3.0,2017-05-10 13:26:48,,41.0,...,2.0,,1.0,,,Yes,cellular,311.0,01,2.0
5,,,3.0,13,Moderate,,3.0,2013-04-30 07:00:00,16.0,1514.0,...,10.0,,1.0,37.0,2500,Yes,residential,311.0,01,2.0
6,,2.0,4.0,14,Moderate,,3.0,2017-01-15 18:36:12,4.0,156.0,...,4.0,,1.0,38.0,1363202,Yes,residential,311.0,01,2.0
7,6.0,,4.0,14,Moderate,,3.0,2017-06-21 12:49:50,1.0,0.0,...,3.0,,1.0,5.0,748,Yes,residential,311.0,01,2.0
8,4.0,,4.0,14,Moderate,,3.0,2017-06-21 12:49:14,,0.0,...,,100,1.0,28.0,,Yes,residential,311.0,17,2.0
9,,,4.0,14,Moderate,,3.0,2017-06-21 15:24:50,2.0,0.0,...,1.0,,1.0,5.0,,Yes,residential,311.0,01,2.0


In [80]:
dataset_full_clean[col_names_no_key_no_target].shape

(798, 59)

In [81]:
dataset_full_clean.shape

(798, 62)

In [83]:
dataset_full_clean[col_names_no_key_no_target]

Unnamed: 0,v109,v9,v183,v001,v199,v12,v196,v175,v119,v176,...,v120,v27,v191,v29,v4,v193,v204,v200,v171.1,v181
0,,,4.0,15,Moderate,1.0,3.0,2017-06-19 13:30:01,,0.0,...,,1400,1.0,1.0,1400,Yes,business,311.0,01,2.0
1,,,4.0,13,Moderate,1.0,3.0,2017-06-21 11:35:53,,0.0,...,,500,1.0,1.0,500,Yes,residential,311.0,01,2.0
2,,,3.0,13,Moderate,,4.0,2017-06-22 14:13:01,12.0,0.0,...,,,1.0,27.0,497586,Yes,residential,311.0,01,2.0
3,1.0,,4.0,13,Moderate,,3.0,2017-06-06 12:12:50,,14.0,...,6.0,,1.0,,,Yes,cellular,311.0,01,2.0
4,,,4.0,13,Moderate,,3.0,2017-05-10 13:26:48,,41.0,...,2.0,,1.0,,,Yes,cellular,311.0,01,2.0
5,,,3.0,13,Moderate,,3.0,2013-04-30 07:00:00,16.0,1514.0,...,10.0,,1.0,37.0,2500,Yes,residential,311.0,01,2.0
6,,2.0,4.0,14,Moderate,,3.0,2017-01-15 18:36:12,4.0,156.0,...,4.0,,1.0,38.0,1363202,Yes,residential,311.0,01,2.0
7,6.0,,4.0,14,Moderate,,3.0,2017-06-21 12:49:50,1.0,0.0,...,3.0,,1.0,5.0,748,Yes,residential,311.0,01,2.0
8,4.0,,4.0,14,Moderate,,3.0,2017-06-21 12:49:14,,0.0,...,,100,1.0,28.0,,Yes,residential,311.0,17,2.0
9,,,4.0,14,Moderate,,3.0,2017-06-21 15:24:50,2.0,0.0,...,1.0,,1.0,5.0,,Yes,residential,311.0,01,2.0


In [84]:
col_names_no_key_no_target

['v109',
 'v9',
 'v183',
 'v001',
 'v199',
 'v12',
 'v196',
 'v175',
 'v119',
 'v176',
 'v123',
 'v25',
 'v002',
 'v173',
 'v32',
 'v28',
 'v195',
 'v122',
 'v106',
 'v202',
 'v5',
 'v113',
 'v184',
 'v34',
 'v112',
 'v172.1',
 'v23',
 'v178',
 'v186',
 'v37',
 'v192',
 'v182',
 'v117',
 'v174',
 'v24',
 'v197',
 'v198',
 'v33',
 'v179',
 'v36',
 'v116',
 'v177',
 'v201',
 'v194',
 'v105',
 'v185',
 'v180',
 'v203',
 'v14',
 'v120',
 'v27',
 'v191',
 'v29',
 'v4',
 'v193',
 'v204',
 'v200',
 'v171.1',
 'v181']