<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#2)-Data-Wrangling" data-toc-modified-id="2)-Data-Wrangling-1">2) Data Wrangling</a></span><ul class="toc-item"><li><span><a href="#2.1)-Load-data-and-start-explore" data-toc-modified-id="2.1)-Load-data-and-start-explore-1.1">2.1) Load data and start explore</a></span></li><li><span><a href="#2.2)-Check-missing-values-and-duplicates" data-toc-modified-id="2.2)-Check-missing-values-and-duplicates-1.2">2.2) Check missing values and duplicates</a></span></li><li><span><a href="#2.3)-Take-all-features-and-two-target-variables-and-further-explore." data-toc-modified-id="2.3)-Take-all-features-and-two-target-variables-and-further-explore.-1.3">2.3) Take all features and two target variables and further explore.</a></span></li><li><span><a href="#2.4)-Transform-two-target-variables-to-binary-classification" data-toc-modified-id="2.4)-Transform-two-target-variables-to-binary-classification-1.4">2.4) Transform two target variables to binary classification</a></span></li></ul></li></ul></div>

# 2) Data Wrangling

## 2.1) Load data and start explore 
- After reading data description from the website, it is not clear the features are continuos or categorical. Let import the data and look at the features more closely.
- Link to dataset is here https://archive.ics.uci.edu/ml/datasets/Drug+consumption+%28quantified%29

In [1]:
# Import packages
import numpy as np
import pandas as pd

In [2]:
# Load data from the url
drug = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00373/drug_consumption.data", header=None)

In [3]:
# Shape of dataset (should be (1885, 32))
drug.shape

(1885, 32)

In [4]:
drug.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,1,0.49788,0.48246,-0.05921,0.96082,0.126,0.31287,-0.57545,-0.58331,-0.91699,...,CL0,CL0,CL0,CL0,CL0,CL0,CL0,CL2,CL0,CL0
1,2,-0.07854,-0.48246,1.98437,0.96082,-0.31685,-0.67825,1.93886,1.43533,0.76096,...,CL4,CL0,CL2,CL0,CL2,CL3,CL0,CL4,CL0,CL0
2,3,0.49788,-0.48246,-0.05921,0.96082,-0.31685,-0.46725,0.80523,-0.84732,-1.6209,...,CL0,CL0,CL0,CL0,CL0,CL0,CL1,CL0,CL0,CL0
3,4,-0.95197,0.48246,1.16365,0.96082,-0.31685,-0.14882,-0.80615,-0.01928,0.59042,...,CL0,CL0,CL2,CL0,CL0,CL0,CL0,CL2,CL0,CL0
4,5,0.49788,0.48246,1.98437,0.96082,-0.31685,0.73545,-1.6334,-0.45174,-0.30172,...,CL1,CL0,CL0,CL1,CL0,CL0,CL2,CL2,CL0,CL0


In [5]:
drug.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1885 entries, 0 to 1884
Data columns (total 32 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       1885 non-null   int64  
 1   1       1885 non-null   float64
 2   2       1885 non-null   float64
 3   3       1885 non-null   float64
 4   4       1885 non-null   float64
 5   5       1885 non-null   float64
 6   6       1885 non-null   float64
 7   7       1885 non-null   float64
 8   8       1885 non-null   float64
 9   9       1885 non-null   float64
 10  10      1885 non-null   float64
 11  11      1885 non-null   float64
 12  12      1885 non-null   float64
 13  13      1885 non-null   object 
 14  14      1885 non-null   object 
 15  15      1885 non-null   object 
 16  16      1885 non-null   object 
 17  17      1885 non-null   object 
 18  18      1885 non-null   object 
 19  19      1885 non-null   object 
 20  20      1885 non-null   object 
 21  21      1885 non-null   object 
 22  

Check var2 (column1) - "age" to see if they have the same distribution as described in the documentation.

In [6]:
unique = drug[1].unique()
np.sort(unique)

array([-0.95197, -0.07854,  0.49788,  1.09449,  1.82213,  2.59171])

In [None]:
drug[1].value_counts()

-0.95197    643
-0.07854    481
 0.49788    356
 1.09449    294
 1.82213     93
 2.59171     18
Name: 1, dtype: int64

- Yes, the distribution is the same as in the web.
- Question to be considered: should we consider this feature continuos or categorical?

Check var14 (column13) - "Alcohol" and var15 to see if they have the same distribution as in the web.

In [7]:
np.sort(drug[13].unique())

array(['CL0', 'CL1', 'CL2', 'CL3', 'CL4', 'CL5', 'CL6'], dtype=object)

In [8]:
drug[13].value_counts()

CL5    759
CL6    505
CL4    287
CL3    198
CL2     68
CL1     34
CL0     34
Name: 13, dtype: int64

In [9]:
drug[14].value_counts()

CL0    976
CL2    243
CL1    230
CL3    198
CL6    102
CL4     75
CL5     61
Name: 14, dtype: int64

Yes, these have the same distribution as in the web.

## 2.2) Check missing values and duplicates

In [10]:
# Check missing values
drug.isnull().sum().sort_values(ascending=False)

0     0
1     0
30    0
29    0
28    0
27    0
26    0
25    0
24    0
23    0
22    0
21    0
20    0
19    0
18    0
17    0
16    0
15    0
14    0
13    0
12    0
11    0
10    0
9     0
8     0
7     0
6     0
5     0
4     0
3     0
2     0
31    0
dtype: int64

In [11]:
# Check uniqueness of ID number
duplicates = drug.duplicated(subset=0, keep=False)
drug[duplicates]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31


In [12]:
# alternatively 
duplicates.sum()

0

In [16]:
# alternatively
drug[0].unique().shape

(1885,)

In [22]:
# We see all ID numbers are unique. Let check if there are two IDs that have all other columns the same
# That is to set 'subset' = all columns except column 0
duplicates = drug.duplicated(subset=drug.columns.difference([0]), keep=False)
drug[duplicates]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31


## 2.3) Take all features and two target variables and further explore.
- We will focus on Amyl (var16) (because this is 'party drug') and Cannabis (var19) (because this is everywhere now)
- Rename columns so that the names make sense.
- Most of features in the dataset were categoretical originally. The dataset owners quantified them and we dataset we have has quantified features. While some quantified features make sense (for example personality traits), other quanfitied features don't (for example country and ethnicity). We will convert some features (age, gender, education, country, ethnicity) back to categoretical data type (we still keep their quantified values).

In [32]:
# Drop the drugs that we do not want. We will keep two drugs - column15 and 18. 
# So we will drop columns 13, 14, 16, 17 and 19-31
# Create list of these number
col_list = [13, 14, 16, 17] + list(range(19, 32))
col_list

[13, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

In [33]:
# Drop these columns
drug_1 = drug.drop(col_list, axis=1).copy()

In [34]:
# Check drug_1 columns
drug_1.columns

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 18], dtype='int64')

In [37]:
# Rename the column names
col_name = (['ID', 'Age_value', 'Gender_value', 'Education_value', 'Country_value', 'Ethnicity_value',
            'Nscore', 'Escore', 'Oscore', 'Ascore', 'Cscore', 'Impulsive', 'SS', 'Amyl', 'Cannabis'])

In [38]:
col_name

['ID',
 'Age_value',
 'Gender_value',
 'Education_value',
 'Country_value',
 'Ethnicity_value',
 'Nscore',
 'Escore',
 'Oscore',
 'Ascore',
 'Cscore',
 'Impulsive',
 'SS',
 'Amyl',
 'Cannabis']

In [39]:
drug_1.columns = col_name
drug_1.head()

Unnamed: 0,ID,Age_value,Gender_value,Education_value,Country_value,Ethnicity_value,Nscore,Escore,Oscore,Ascore,Cscore,Impulsive,SS,Amyl,Cannabis
0,1,0.49788,0.48246,-0.05921,0.96082,0.126,0.31287,-0.57545,-0.58331,-0.91699,-0.00665,-0.21712,-1.18084,CL0,CL0
1,2,-0.07854,-0.48246,1.98437,0.96082,-0.31685,-0.67825,1.93886,1.43533,0.76096,-0.14277,-0.71126,-0.21575,CL2,CL4
2,3,0.49788,-0.48246,-0.05921,0.96082,-0.31685,-0.46725,0.80523,-0.84732,-1.6209,-1.0145,-1.37983,0.40148,CL0,CL3
3,4,-0.95197,0.48246,1.16365,0.96082,-0.31685,-0.14882,-0.80615,-0.01928,0.59042,0.58489,-1.37983,-1.18084,CL0,CL2
4,5,0.49788,0.48246,1.98437,0.96082,-0.31685,0.73545,-1.6334,-0.45174,-0.30172,1.30612,-0.21712,-0.21575,CL1,CL3


In [40]:
# Create categorical variable 'Age' and look at distributions
# Define function to create new column 'Age'
def f(row):
    if row['Age_value'] == -0.95197:
        val = 'age18_24'
    elif row['Age_value'] == -0.07854:
        val = 'age25-34'
    elif row['Age_value'] == 0.49788:
        val = 'age35-44'
    elif row['Age_value'] == 1.09449:
        val = 'age45-54'
    elif row['Age_value'] == 1.82213:
        val = 'age55-64'
    elif row['Age_value'] == 2.59171:
        val = 'age65+'
    else:
        val = 'Check Age_value'
    return val

In [42]:
# Create new column 'Age'
drug_1['Age'] = drug_1.apply(f, axis=1)
drug_1.filter(regex='^Age', axis=1).head()

Unnamed: 0,Age_value,Age
0,0.49788,age35-44
1,-0.07854,age25-34
2,0.49788,age35-44
3,-0.95197,age18_24
4,0.49788,age35-44


In [43]:
# Look at distribution of Age
drug_1['Age'].value_counts()
# Distribution should be the same as in dataset description by the owners.

age18_24    643
age25-34    481
age35-44    356
age45-54    294
age55-64     93
age65+       18
Name: Age, dtype: int64

In [44]:
# Save drug_1 to hard disk
drug_1.to_csv('../data/drug_1.csv')

In [45]:
# Create new categorical features 'Gender'
drug_1['Gender'] = np.where(drug_1['Gender_value'] > 0, 'Female', 'Male')
drug_1.filter(regex='^Gender', axis=1).sample(5)

Unnamed: 0,Gender_value,Gender
655,-0.48246,Male
141,0.48246,Female
1422,0.48246,Female
1518,-0.48246,Male
1113,-0.48246,Male


In [46]:
# Look at distribution of Gender
drug_1['Gender'].value_counts()
# Distribution should be the same as in dataset description by the owners.

Male      943
Female    942
Name: Gender, dtype: int64

In [49]:
# Create categorical var "Edu". Thanh manually create link file in Excel. We will load the link dataframe and use left join to 
# our dataframe drug_1
edu = pd.read_excel('../raw_data/Education.xlsx')
edu

Unnamed: 0,Education_value,Education_level,Education,Edu_comment
0,-2.43591,1,Left school before 16,Left school before 16 years 28 1.49%
1,-1.7379,2,Left school at 16,Left school at 16 years 99 5.25%
2,-1.43719,3,Left school at 17,Left school at 17 years 30 1.59%
3,-1.22751,4,Left school at 18,Left school at 18 years 100 5.31%
4,-0.61113,5,Some college but no degree,"Some college or university, no certificate or ..."
5,-0.05921,6,Professional certificate,Professional certificate/ diploma 270 14.32%
6,0.45468,7,University degree,University degree 480 25.46%
7,1.16365,8,Masters degree,Masters degree 283 15.01%
8,1.98437,9,Doctorate degree,Doctorate degree 89 4.72%


In [51]:
# left join to drug_1
drug_1 = drug_1.merge(edu[['Education_value', 'Education_level', 'Education']], on='Education_value', how='left')
print(drug_1.shape)
drug_1.filter(regex='^Education', axis=1).sample(10)

(1885, 19)


Unnamed: 0,Education_value,Education_level,Education
1645,1.16365,8,Masters degree
1814,-0.61113,5,Some college but no degree
863,-0.61113,5,Some college but no degree
1494,1.16365,8,Masters degree
267,-0.05921,6,Professional certificate
1877,-1.43719,3,Left school at 17
1530,-0.61113,5,Some college but no degree
1073,-1.43719,3,Left school at 17
733,0.45468,7,University degree
845,0.45468,7,University degree


In [52]:
# Look at distribution of Education
drug_1['Education'].value_counts()
# Distribution should be the same as in dataset description by the owners.

Some college but no degree    506
University degree             480
Masters degree                283
Professional certificate      270
Left school at 18             100
Left school at 16              99
Doctorate degree               89
Left school at 17              30
Left school before 16          28
Name: Education, dtype: int64

In [53]:
# Create categorical var "Country". Thanh manually create link file. We will load the link dataframe and use left join to 
# our dataframe drug_1
country = pd.read_excel('../raw_data/Country.xlsx')
country

Unnamed: 0,Country_value,Country,Country_comment
0,-0.09765,Australia,Australia 54 2.86%
1,0.24923,Canada,Canada 87 4.62%
2,-0.46841,New Zealand,New Zealand 5 0.27%
3,-0.28519,Other,Other 118 6.26%
4,0.21128,Republic of Ireland,Republic of Ireland 20 1.06%
5,0.96082,UK,UK 1044 55.38%
6,-0.57009,USA,USA 557 29.55%


In [54]:
# left join to drug_1
drug_1 = drug_1.merge(country[['Country_value', 'Country']], on='Country_value', how='left')
print(drug_1.shape)
drug_1.filter(regex='^Country', axis=1).sample(10)

(1885, 20)


Unnamed: 0,Country_value,Country
591,-0.28519,Other
70,0.96082,UK
725,0.96082,UK
923,-0.57009,USA
1215,-0.57009,USA
1536,-0.57009,USA
1420,-0.57009,USA
1831,-0.57009,USA
1081,0.96082,UK
327,0.96082,UK


In [55]:
# Look at distribution of Country
drug_1['Country'].value_counts()
# Distribution should be the same as in dataset description by the owners.

UK                     1044
USA                     557
Other                   118
Canada                   87
Australia                54
Republic of Ireland      20
New Zealand               5
Name: Country, dtype: int64

In [57]:
# Create categorical var "Ethnicity". Thanh manually create link file. We will load the link dataframe and use left join to 
# our dataframe drug_1
ethnicity = pd.read_excel('../raw_data/Ethnicity.xlsx')
ethnicity

Unnamed: 0,Ethnicity_value,Ethnicity,Ethnicity_cases,Ethnicity_fraction
0,-0.50212,Asian,26,0.0138
1,-1.10702,Black,33,0.0175
2,1.90725,Mixed-Black/Asian,3,0.0016
3,0.126,Mixed-White/Asian,20,0.0106
4,-0.22166,Mixed-White/Black,20,0.0106
5,0.1144,Other,63,0.0334
6,-0.31685,White,1720,0.9125


In [58]:
# left join to drug_1
drug_1 = drug_1.merge(ethnicity[['Ethnicity_value', 'Ethnicity']], on='Ethnicity_value', how='left')
print(drug_1.shape)
drug_1.filter(regex='^Ethnicity', axis=1).sample(10)

(1885, 21)


Unnamed: 0,Ethnicity_value,Ethnicity
931,-0.31685,White
3,-0.31685,White
125,-0.31685,White
1877,-0.31685,White
470,-0.22166,Mixed-White/Black
565,-0.31685,White
1095,-0.31685,White
1684,-0.31685,White
15,-0.31685,White
62,-0.31685,White


In [59]:
# Look at distribution of Ethnicity
drug_1['Ethnicity'].value_counts()
# Distribution should be the same as in dataset description by the owners.

White                1720
Other                  63
Black                  33
Asian                  26
Mixed-White/Asian      20
Mixed-White/Black      20
Mixed-Black/Asian       3
Name: Ethnicity, dtype: int64

In [63]:
# Create variable 'Age_level' that is similar to 'Education_level'
# Define function to create new column 'Age_level'
def f_1(row):
    if row['Age_value'] == -0.95197:
        val = 1
    elif row['Age_value'] == -0.07854:
        val = 2
    elif row['Age_value'] == 0.49788:
        val = 3
    elif row['Age_value'] == 1.09449:
        val = 4
    elif row['Age_value'] == 1.82213:
        val = 5
    elif row['Age_value'] == 2.59171:
        val = 6
    else:
        val = 0
    return val

In [64]:
# Create new column 'Age_level'
drug_1['Age_level'] = drug_1.apply(f_1, axis=1)
drug_1.filter(regex='^Age', axis=1).sample(10)

Unnamed: 0,Age_value,Age,Age_level
890,-0.07854,age25-34,2
791,-0.07854,age25-34,2
1216,-0.07854,age25-34,2
99,-0.07854,age25-34,2
311,-0.07854,age25-34,2
1496,1.09449,age45-54,4
88,0.49788,age35-44,3
818,-0.95197,age18_24,1
1296,-0.07854,age25-34,2
369,-0.95197,age18_24,1


In [65]:
# Look at distribution of Age_level
drug_1['Age_level'].value_counts()
# Distribution should be the same as in dataset description by the owners.

1    643
2    481
3    356
4    294
5     93
6     18
Name: Age_level, dtype: int64

In [73]:
# Let check distribution of all other personality features
col_list = ['Nscore', 'Escore', 'Oscore', 'Ascore', 'Cscore', 'Impulsive', 'SS']
for col in col_list:
    print(col)
    print(drug_1[col].value_counts().sort_index())

Nscore
-3.46436     1
-3.15735     1
-2.75696     7
-2.52197     4
-2.42317     3
-2.34360     4
-2.21844    10
-2.05048    16
-1.86962    24
-1.69163    31
-1.55078    26
-1.43907    29
-1.32828    35
-1.19430    56
-1.05308    57
-0.92104    65
-0.79151    70
-0.67825    60
-0.58016    61
-0.46725    87
-0.34799    78
-0.24649    68
-0.14882    76
-0.05188    69
 0.04257    73
 0.13606    67
 0.22393    63
 0.31287    66
 0.41667    80
 0.52135    61
 0.62967    77
 0.73545    49
 0.82562    51
 0.91093    37
 1.02119    67
 1.13281    27
 1.23461    49
 1.37297    40
 1.49158    24
 1.60383    27
 1.72012    17
 1.83990    20
 1.98437    15
 2.12700    11
 2.28554    10
 2.46262     6
 2.61139     3
 2.82196     5
 3.27393     2
Name: Nscore, dtype: int64
Escore
-3.27393      2
-3.00537      1
-2.72827      6
-2.53830      3
-2.44904      3
-2.32338      8
-2.21069      5
-2.11437      9
-2.03972      4
-1.92173     21
-1.76250     23
-1.63340     23
-1.50796     32
-1.37639     38


## 2.4) Transform two target variables to binary classification
The two target variables Amyl and Cannabis have values from CL0 to CL7. Let transform to binary classification: "Never Used" and "Used over a Decade Ago" are considered "Non-user"; and all other classes form class "User".

In [75]:
# Look at value_counts of the two drug
col_list = ['Amyl', 'Cannabis']
for col in col_list:
    print(col)
    print(drug_1[col].value_counts().sort_index())

Amyl
CL0    1305
CL1     210
CL2     237
CL3      92
CL4      24
CL5      14
CL6       3
Name: Amyl, dtype: int64
Cannabis
CL0    413
CL1    207
CL2    266
CL3    211
CL4    140
CL5    185
CL6    463
Name: Cannabis, dtype: int64


In [86]:
# Create 'Amyl_binary'
drug_1['Amyl_binary'] = np.where(drug_1['Amyl'].isin(['CL0', 'CL1']), 'Non-user', 'User')

In [88]:
# Create dummy variable 'Amyl_user'
drug_1['Amyl_user'] = np.where(drug_1['Amyl'].isin(['CL0', 'CL1']), 0, 1)
drug_1.filter(regex='^Amyl', axis=1).sample(10)

Unnamed: 0,Amyl,Amyl_binary,Amyl_user
228,CL1,Non-user,0
462,CL2,User,1
115,CL2,User,1
134,CL0,Non-user,0
250,CL0,Non-user,0
165,CL0,Non-user,0
113,CL0,Non-user,0
788,CL4,User,1
675,CL2,User,1
735,CL0,Non-user,0


In [89]:
# Look at value_counts of the two new var
col_list = ['Amyl_binary', 'Amyl_user']
for col in col_list:
    print(col)
    print(drug_1[col].value_counts().sort_index())

Amyl_binary
Non-user    1515
User         370
Name: Amyl_binary, dtype: int64
Amyl_user
0    1515
1     370
Name: Amyl_user, dtype: int64


In [90]:
# Create 'Cannabis_binary' and dummy variable 'Cannabis_user'
drug_1['Cannabis_binary'] = np.where(drug_1['Cannabis'].isin(['CL0', 'CL1']), 'Non-user', 'User')
drug_1['Cannabis_user'] = np.where(drug_1['Cannabis'].isin(['CL0', 'CL1']), 0, 1)
drug_1.filter(regex='^Cannabis', axis=1).sample(10)

Unnamed: 0,Cannabis,Cannabis_binary,Cannabis_user
500,CL2,User,1
253,CL2,User,1
862,CL6,User,1
296,CL0,Non-user,0
1354,CL3,User,1
142,CL1,Non-user,0
1236,CL0,Non-user,0
1352,CL6,User,1
98,CL1,Non-user,0
48,CL5,User,1


In [91]:
# Look at value_counts of the two new var
col_list = ['Cannabis_binary', 'Cannabis_user']
for col in col_list:
    print(col)
    print(drug_1[col].value_counts().sort_index())

Cannabis_binary
Non-user     620
User        1265
Name: Cannabis_binary, dtype: int64
Cannabis_user
0     620
1    1265
Name: Cannabis_user, dtype: int64


In [92]:
# Hope we finish data wrangling now 
drug_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1885 entries, 0 to 1884
Data columns (total 26 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   ID               1885 non-null   int64  
 1   Age_value        1885 non-null   float64
 2   Gender_value     1885 non-null   float64
 3   Education_value  1885 non-null   float64
 4   Country_value    1885 non-null   float64
 5   Ethnicity_value  1885 non-null   float64
 6   Nscore           1885 non-null   float64
 7   Escore           1885 non-null   float64
 8   Oscore           1885 non-null   float64
 9   Ascore           1885 non-null   float64
 10  Cscore           1885 non-null   float64
 11  Impulsive        1885 non-null   float64
 12  SS               1885 non-null   float64
 13  Amyl             1885 non-null   object 
 14  Cannabis         1885 non-null   object 
 15  Age              1885 non-null   object 
 16  Gender           1885 non-null   object 
 17  Education_leve

In [93]:
# Save drug_1 to hard disk
drug_1.to_csv('../data/drug_1.csv')

Here is Thanh's research about the variables in the dataset.  
At the end, Thanh will choose Amyl (var16) (this is 'party drug') and Cannabis (var19) (because it is everywhere now)

1) Variable description \
var1 (column0): ID number \
var2(column1): Age \
var3: Gender (why quantified? should not be treated as continuos var) \
var4: Education \
var5: Country (should not be treated as continuos var) \
var6: Ethnicity (why quantified? should not be treated as continuos var) \
var7: Nscore = Neuroticism (trait characterized by iimpulnstability, anxiety, etc; loan than kinh, de bi kich thich) \
var8: Escore = Extraversion (the directing of one's interest outwards, esp towards social contacts) \
var9: Oscore = Openness to experience \
var10: Ascore = Agreeableness (Tinh de chiu, tinh de thuong, su tan thanh, su dong y) \
var11: Cscore = Conscientiousness, su tan tam, lam theo luong tam, su ngay thang \
var12: impulsiveness, do things suddenly without thinking carefully \
var13: SS = sensation seeking \
var14: alcohol  
var15: Amphet https://medlineplus.gov/ency/patientinstructions/000792.htm 
Amphetamines are drugs. They can be legal or illegal. They are legal when they are prescribed by a doctor and used to treat health problems such as obesity, narcolepsy, or attention deficit hyperactivity disorder (ADHD). Using amphetamines can lead to addiction.

Amphetamines are illegal when they are used without a prescription to get high or improve performance. In this case, they are known as street, or recreational drugs, and using them can lead to addiction. This article describes this aspect of amphetamines.

var16: Amyl - 'party drug', depresant; https://adf.org.au/drug-facts/amyl-nitrite/

var17: Benzos - Familiar names include Valium and Xanax. They are some of the most commonly prescribed medications in the United States. When people without prescriptions obtain and take these drugs for their sedating effects, use turns into abuse. https://www.webmd.com/mental-health/addiction/benzodiazepine-abuse#1

var18: Caff = caffeine \
var19: Cannabis =  Marijuana, yarndi, pot, weed, hash, dope, gunja, joint, stick, chronic, cone, choof, mull, 420, dabs, dabbing, BHO.

var20: Choc = chocolate \
var21: Cocaine \
var22: Crack \
var23: Ecstasy (XTC, X, Adam, E, Roll, A, 007, Molly) https://kidshealth.org/en/parents/drugs-ecstasy.html
Ecstasy is a stimulant drug that can cause hallucinations (ao giac). It is known as a designer drug because it was created for the purpose of making someone feel high. The drug is popular with teens and young adults who go to clubs, concerts, or "rave" parties.

var24: Heroin
var25: Ketamine https://adf.org.au/drug-facts/ketamine/  Ketamine is used by medical practitioners and veterinarians as an anaesthetic. It’s sometimes used illegally by people to get high.

Ketamine is a dissociative drug, which means it acts on different chemicals in the brain to produce visual and auditory distortion, and a detachment from reality.

When it’s sold illegally, ketamine usually comes as a white or off-white powder. It can also be made into pills, or dissolved in a liquid.1

Clinical trials and studies are assessing ketamine as a treatment for depression. Early indications show good results.2

var26: Legal highs https://www.themanorclinic.com/addiction-treatment/legal-high-rehab/legal-high-addiction-symptoms
New psychoactive and illicit substances (‘legal highs’) are synthetic substances that have been specifically designed to imitate the effects of other illegal drugs including cocaine, ecstasy/MDMA and certain prescription medications. Because these drugs contain a whole host of unknown and untested chemicals, they are associated with a wide range of symptoms and effects, which can vary depending on the type of so-called ‘legal high’ that you are addicted to, as well as how much of the drug you have been consuming.

var27: LSD https://adf.org.au/drug-facts/lsd/
LSD (lysergic acid diethylamide) is a synthetic chemical, made from a substance found in ergot, which is a fungus that infects rye (grain).1

LSD belongs to a group of drugs known as psychedelics. When small doses are taken, it can produce mild changes in perception, mood and thought. Larger doses may produce visual hallucinations and distortions of space and time.2

Sometimes, what is sold as LSD can actually be other chemicals such as NBOMe or the 2C family of drugs (part of the new psychoactive substances). These can be quite dangerous, as their quality is inconsistent. Taking too much of these other substances can be fatal with a number of deaths having been reported.3, 4

var28: Meth Methadone is part of a category of drugs called opioids. German doctors created it during World War II. When it came to the United States, doctors used it to treat people with extreme pain. Today, you might also get it as part of a treatment program for an addiction to heroin or narcotic painkillers.
https://www.webmd.com/mental-health/addiction/what-is-methadone#1

var29: Mushrooms magic mushrooms consumption Psilocybin or magic mushrooms are naturally occurring and are consumed for their hallucinogenic effects. 

They are psychedelic drugs, which means they can affect all the senses, altering a person’s thinking, sense of time and emotions. Psychedelics can cause a person to hallucinate, seeing or hearing things that do not exist or are distorted.1

https://adf.org.au/drug-facts/psilocybin/

var30: Nicotine \
var31: Semer - fictious drug

var32: VSA Domestic products such as spray deodorants, glue, lighter refills and spray air fresheners can be used as drugs.

Volatile substance use may be defined as the deliberate inhalation of volatile compounds to produce psychoactive effects. These compounds have few characteristics in common, other than their intoxication effects and the behavioural effects they produce. Such volatile substances are often referred to as inhalants, a term which encompasses a diverse group of psychoactive chemicals that are defined by the route of administration, rather than their mechanism of action on the central nervous system or psychoactive effects.

The use of volatile substances is unlike most other forms of drug use in that it involves various compounds contained in readily accessible domestic or commercial products. These compounds, that are safe when used for their intended purposes, may cause intoxication and in some cases death when their vapours are deliberately concentrated and inhaled.

A specific subgroup of volatile substances — alkyl nitrites — are used on the dance club scene because they cause relaxation of vascular smooth muscle and produce a ‘rush’, or to enhance a sexual experience. They are generally known as ‘poppers’ and can be found on the ‘street’ market in bars and clubs. In some countries, they are available in sex shops and ‘head’ shops.

https://www.emcdda.europa.eu/publications/drug-profiles/volatile_en

