We're going to examine data scraped from a dark web drugs marketplace that a data scientist compiled:
https://docs.google.com/spreadsheets/d/14GbheDtl0a1uXZ4SJiy3rH93t4RUTv5IVhxSMHbJ7yU/edit?usp=sharing
Download it locally as an Excel file so we can analyze it.

In [3]:
#import our very import pandas library that lets us treat
#a data object like a spreadsheet
import pandas as pd

In [5]:
#pandas' read_excel function comes in very handy to 
#create a new dataframe out of the spreadsheet.
#make sure your .xlsx file is in the same directory as your
#Jupyter notebook
drugs = pd.read_excel('dream_market_cocaine_listings.xls.xlsx')

In [6]:
#use the drugs dataframe's .info() method
#to get a list of all the columns, how many rows in each is not empty
#and the type of data in it.
drugs.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1504 entries, 10 to 4611
Data columns (total 63 columns):
product_title              1504 non-null object
ships_from_to              1504 non-null object
grams                      1504 non-null float64
quality                    1504 non-null float64
btc_price                  1504 non-null float64
cost_per_gram              1504 non-null float64
cost_per_gram_pure         1504 non-null float64
escrow                     1504 non-null int64
product_link               1504 non-null object
vendor_link                1504 non-null object
vendor_name                1504 non-null object
successful_transactions    1504 non-null int64
rating                     1504 non-null float64
ships_from                 1504 non-null object
ships_to                   1504 non-null object
ships_to_US                1504 non-null bool
ships_from_US              1504 non-null bool
ships_to_NL                1504 non-null bool
ships_from_NL              150

In [15]:
#use the dataframe's .head() method to look at the first few rows
drugs.head()

Unnamed: 0,product_title,ships_from_to,grams,quality,btc_price,cost_per_gram,cost_per_gram_pure,escrow,product_link,vendor_link,...,ships_to_SE,ships_from_SE,ships_to_CO,ships_from_CO,ships_to_CN,ships_from_CN,ships_to_PL,ships_from_PL,ships_to_GR,ships_from_GR
10,!!!!!INTRO OFFER!!!!! 1GR COCAINE 90%,NL → EU,1.0,90.0,0.02577,0.02577,0.028633,1,http://lchudifyeqm4ldjj.onion/viewProduct?offe...,http://lchudifyeqm4ldjj.onion/contactMember?me...,...,False,False,False,False,False,False,False,False,False,False
11,!!!!!INTRO OFFER!!!!! 2GR COCAINE 90%,NL → EU,2.0,90.0,0.0515,0.02575,0.028611,1,http://lchudifyeqm4ldjj.onion/viewProduct?offe...,http://lchudifyeqm4ldjj.onion/contactMember?me...,...,False,False,False,False,False,False,False,False,False,False
14,!!!INTRO!!! 0.5G COCAINE 89% - STRAIGHT FROM T...,NL → EU,0.5,89.0,0.01649,0.03298,0.037056,1,http://lchudifyeqm4ldjj.onion/viewProduct?offe...,http://lchudifyeqm4ldjj.onion/contactMember?me...,...,False,False,False,False,False,False,False,False,False,False
20,!1G! C O L O M B I A N C O C A I N E - 89% PURITY,FR → EU,1.0,89.0,0.0412,0.0412,0.046292,1,http://lchudifyeqm4ldjj.onion/viewProduct?offe...,http://lchudifyeqm4ldjj.onion/contactMember?me...,...,False,False,False,False,False,False,False,False,False,False
41,** 1 Gram 87% Pure Uncut Colombian Cocaine **,NL → WW,1.0,87.0,0.034,0.034,0.03908,1,http://lchudifyeqm4ldjj.onion/viewProduct?offe...,http://lchudifyeqm4ldjj.onion/contactMember?me...,...,False,False,False,False,False,False,False,False,False,False


In [14]:
#we want to know which country has the highest rated cocaine
#in the marketplace
#first, let's see how often each rating in the five-point scale appears
#note that the .value_counts() method on the rating column sorts
#the data in descending order by default, which is what we want
drugs.rating.value_counts()

5.00    260
4.94    112
4.88    101
4.97     82
4.96     79
4.91     71
4.89     70
4.84     61
4.87     58
4.95     53
4.98     51
4.99     43
4.86     39
4.93     37
4.76     36
4.90     33
4.58     29
4.69     29
4.72     26
4.85     25
4.92     25
4.63     20
4.66     18
4.80     16
4.39     16
4.62     14
4.74     14
4.78     13
4.55     11
4.82     11
4.70     11
4.60      9
4.83      8
4.73      7
4.75      7
4.71      6
4.68      3
Name: rating, dtype: int64

In [19]:
#if we're interested in countries, we should see what the possible values are
#in the ships_from column. Now this data didn't come with a dictionary,
#so we don't necessarily know what all the abbreviations mean
#but that's a problem we'll set aside for now.
drugs.ships_from.unique()

array([u'NL', u'FR', u'GB', u'DE', u'US', u'AU', u'ES', u'BE', u'EU',
       u'WW', u'IT', u'CA', u'BR', u'CZ', u'SE', u'CH', u'CN'], dtype=object)

In [20]:
#let's see which country has the most ratings in the data
#with the .value_counts() method of the ships_from column 
#of the drugs dataframe
drugs.ships_from.value_counts()

NL    496
DE    376
GB    251
US    112
EU     72
FR     70
AU     40
BE     25
CA     19
WW     16
ES     11
IT      7
BR      5
CN      1
CH      1
SE      1
CZ      1
Name: ships_from, dtype: int64

In [41]:
#now, we want to calculate an average rating for each country
#so first groupby the ships_from column
#then use the .mean() method to calculate the averages of every
#numeric column in the dataframe.
#then sort the values in descending values (if ascending is False, it must
#be descending) first by the rating column, and in case of a tie there by
#the quality column

drugs.groupby('ships_from').mean() \
  .sort_values(['rating', 'quality'], ascending=False).rating
    
#the backslash lets me continue my one line of code on a new line
#to make it easier to read

ships_from
CN    5.000000
SE    5.000000
CH    5.000000
BR    5.000000
BE    4.980800
CA    4.967368
CZ    4.950000
AU    4.941250
EU    4.936944
IT    4.935714
US    4.908393
WW    4.907500
FR    4.906857
GB    4.891195
DE    4.870160
NL    4.861149
ES    4.798182
Name: rating, dtype: float64