## Using Python's NumPy and Pandas for data analysis

The **[Pandas](https://pandas.pydata.org/)** module allows the easy analysis of data sets (organized in rows like in a spreadsheet) through a high-level data frame structure. 

Pandas requires **[NumPy](http://www.numpy.org/)**, which allows the creation of a low-level data structure like multidemensional arrays.

That is why everytime you need Pandas, you need to import the pandas and numby libraries as shown:

In [1]:
import numpy as np
import pandas as pd

<hr>
    
## Let's learn through examples

### 1) Creating a dataframe in the code

We can use the .DataFrame to explicitly add any data we want directly into the code as shown:

In [2]:
# You can create your own data right in the code using DataFrame(). 
df = pd.DataFrame(
    [
        ['twtrid1','1:25 PM - 28 Nov 2018','text 1',0,3,2],
        ['twtrid2','3:00 PM - 26 Nov 2018','text 2',1,12,14],
        ['twtrid3','7:14 PM - 8 Nov 2018','text 3',4,12,17],
        ['twtrid4','10:33 AM - 12 Nov 2018','text 4',0,3,2],
        ['twtrid5','8:55 AM - 30 Nov 2018','text 5',6,10,6]
    ],

#You can use index gives each row a specific number like an excel sheet
    index = [1,2,3,4,5],

#You can give the columns names as well using the 'columns' variable
    columns = ['tweeter','time','text','replies','retweets','favorites'])

df

Unnamed: 0,tweeter,time,text,replies,retweets,favorites
1,twtrid1,1:25 PM - 28 Nov 2018,text 1,0,3,2
2,twtrid2,3:00 PM - 26 Nov 2018,text 2,1,12,14
3,twtrid3,7:14 PM - 8 Nov 2018,text 3,4,12,17
4,twtrid4,10:33 AM - 12 Nov 2018,text 4,0,3,2
5,twtrid5,8:55 AM - 30 Nov 2018,text 5,6,10,6


<hr>

### 2) Importing data from an external file

#### We can also read data from an external source such as a file using read_csv()

In [3]:
filename = 'data/tweet_data.csv'
df = pd.read_csv(filename)
df.head() #We use head() as it will display only the first 5 records (for easier viewing)

Unnamed: 0,tweeter_id,avatar_url,tw_time,tw_text,tw_replies,tw_retweets,tw_favorites
0,GSTICseries,https://pbs.twimg.com/profile_images/875623570...,1:25 PM - 28 Nov 2018,All fueled up? Afternoon session (Waste)#Water...,0,1,2
1,11ionArt,https://pbs.twimg.com/profile_images/106785384...,11:20 AM - 28 Nov 2018,DID YOU KNOW? 5 Ongoing Trends that Will Resha...,0,1,0
2,G2H2_Geneva,https://pbs.twimg.com/profile_images/778233965...,6:08 AM - 28 Nov 2018,Global health disruptors: Millennium developme...,1,2,6
3,AnnaRHaskins,https://pbs.twimg.com/profile_images/941757829...,1:35 AM - 28 Nov 2018,In class today I referenced the Millennium Dev...,3,2,92
4,PensiveTM,https://pbs.twimg.com/profile_images/104786562...,6:08 PM - 27 Nov 2018,The true extent of global poverty and hunger: ...,0,1,1


### 3) Get information about the data set (types, index, columns, values)
#### By calling .dtypes, .index, .columns, .values you can return information about your data set as shown below.

In [4]:
#show data types 
df.dtypes

tweeter_id      object
avatar_url      object
tw_time         object
tw_text         object
tw_replies       int64
tw_retweets      int64
tw_favorites     int64
dtype: object

In [5]:
#show index
df.index

RangeIndex(start=0, stop=135, step=1)

In [6]:
#show columns
df.columns

Index(['tweeter_id', 'avatar_url', 'tw_time', 'tw_text', 'tw_replies',
       'tw_retweets', 'tw_favorites'],
      dtype='object')

In [7]:
#### You can also get the actual values of a specific row by calling it by its index as shown

In [8]:
#show values of the first item
df.values[0]

array(['GSTICseries',
       'https://pbs.twimg.com/profile_images/875623570436575233/tndS3CqJ_bigger.jpg',
       '1:25 PM - 28 Nov 2018',
       'All fueled up? Afternoon session (Waste)#Water as a resource. Special focus on SDG 6. Purpose: review progress of water coming towards #SDGs, reflecting on millennium development goals, and what is the role of science, technology and innovation in reaching these goals. #GSTICpic.twitter.com/vXeKzTUBsG',
       0, 1, 2], dtype=object)

### 4) Specific columns to use
#### You can easily specify which values of the data frame that you wish to use and ignore the rest as shown:


In [9]:
#Let's say we only want to see the texts and number of retweets only, then we can specify them as shown
df1 = df[['tw_text','tw_retweets']]
df1.head()

Unnamed: 0,tw_text,tw_retweets
0,All fueled up? Afternoon session (Waste)#Water...,1
1,DID YOU KNOW? 5 Ongoing Trends that Will Resha...,1
2,Global health disruptors: Millennium developme...,2
3,In class today I referenced the Millennium Dev...,2
4,The true extent of global poverty and hunger: ...,1


### 5) Sorting rows
#### You can also sort the data by any of the variable values (e.g., retweets) using the .sort function as shown:

In [10]:
#specify the second parameter to indicate that it will be in descending order (most retweeted first). Notice that the index now is scrambled since the sorting changed from the original setup
df1.sort_values('tw_retweets', ascending=False).head()


Unnamed: 0,tw_text,tw_retweets
65,What is Millennium Development Goals and what ...,42
45,"""SHEER WILLPOWER! We were the ONLY Sub-Saharan...",34
30,"How well can you pronounce ""Millennium Develop...",28
105,Did the world actually achieve the Millennium ...,28
108,Sustainable Development Goals:\n\nStarted 2015...,11


In [11]:
#If you want to fix the index, you can do a reset and drop the old index using .reset_index(drop=True)
df_reindexed=df1.sort_values('tw_retweets', ascending=False).reset_index(drop=True)
df_reindexed.head()


Unnamed: 0,tw_text,tw_retweets
0,What is Millennium Development Goals and what ...,42
1,"""SHEER WILLPOWER! We were the ONLY Sub-Saharan...",34
2,"How well can you pronounce ""Millennium Develop...",28
3,Did the world actually achieve the Millennium ...,28
4,Sustainable Development Goals:\n\nStarted 2015...,11


### 6) Sorting by multiple columns:
#### To do that, you can specify the column names and whether sorting will be ascending or descending for each have the same number of 

In [12]:
#Sort by retweets and then by the text messages (ascending in alphabetical order)
df2=df1.sort_values(by=['tw_retweets','tw_text'], ascending=[False,True]).reset_index(drop=True)
df2.head()

Unnamed: 0,tw_text,tw_retweets
0,What is Millennium Development Goals and what ...,42
1,"""SHEER WILLPOWER! We were the ONLY Sub-Saharan...",34
2,Did the world actually achieve the Millennium ...,28
3,"How well can you pronounce ""Millennium Develop...",28
4,"""There is a huge different between sustainable...",11


### 7) Selecting a subset of a data frame
#### You can also specify that you wish to extract only the top or lower rows for example. This can be done using the ***.head(), .tail() and brackets [a:b] *** functions as shown:

In [13]:
#Print the top 10 tweets in terms of retweets (sorted alphabetically)
df2.head(10)

Unnamed: 0,tw_text,tw_retweets
0,What is Millennium Development Goals and what ...,42
1,"""SHEER WILLPOWER! We were the ONLY Sub-Saharan...",34
2,Did the world actually achieve the Millennium ...,28
3,"How well can you pronounce ""Millennium Develop...",28
4,"""There is a huge different between sustainable...",11
5,Sustainable Development Goals:\n\nStarted 2015...,11
6,Atiku campaigns told Nigeria is developmental ...,10
7,A1. SDGs are goals formulated to build on the ...,9
8,GLOBAL FRAMEWORKS IN OPERATION:\n\nMillennium ...,9
9,The 17 Sustainable development goals build on ...,9


In [14]:
#Alternatively, you can always use df2[:10] to achieve the same result:
df2[:10]

Unnamed: 0,tw_text,tw_retweets
0,What is Millennium Development Goals and what ...,42
1,"""SHEER WILLPOWER! We were the ONLY Sub-Saharan...",34
2,Did the world actually achieve the Millennium ...,28
3,"How well can you pronounce ""Millennium Develop...",28
4,"""There is a huge different between sustainable...",11
5,Sustainable Development Goals:\n\nStarted 2015...,11
6,Atiku campaigns told Nigeria is developmental ...,10
7,A1. SDGs are goals formulated to build on the ...,9
8,GLOBAL FRAMEWORKS IN OPERATION:\n\nMillennium ...,9
9,The 17 Sustainable development goals build on ...,9


In [15]:
#Print the last 5 tweets in terms of retweets (sorted alphabetically)
df2.tail(5)

Unnamed: 0,tw_text,tw_retweets
130,Who are the ignorant youths who worked with @a...,0
131,atiku IS OUTDATED and his PLAN is too! How can...,0
132,“It is aimed at transforming the amnesty benef...,0
133,“The Philippines then could not pass the mille...,0
134,“The un’s Millennium Development Goals include...,0


In [16]:
#Similarly, you can always use df2[-5:] (using '-' denotes starting from the end) to achieve the same result:
df2[-5:]

Unnamed: 0,tw_text,tw_retweets
130,Who are the ignorant youths who worked with @a...,0
131,atiku IS OUTDATED and his PLAN is too! How can...,0
132,“It is aimed at transforming the amnesty benef...,0
133,“The Philippines then could not pass the mille...,0
134,“The un’s Millennium Development Goals include...,0


In [17]:
#Print the rows from 10 to 18:
df2[10:19] #notice how the second parameter is not included in the final output

Unnamed: 0,tw_text,tw_retweets
10,.@MinisterRw_Edu :One of the key strategic pri...,8
11,"""SHEER WILLPOWER!\nWe were the ONLY Sub-Sahara...",7
12,"Hello Mr Abubakar ,i know you are tired of my ...",7
13,Millennium Development Goals ended in 2015\n\n...,7
14,The 17 Sustainable development goals build on ...,7
15,Good morning beautiful people. The rest of you...,6
16,Kenya started action on Millennium Development...,5
17,A2. We might also say its an extension of the ...,4
18,Cuba is one of the most successful countries i...,4


### 8) Getting max, min, sum and mean values
#### There are specific functions that pandas has to get the above values easily as shown:

In [18]:
#Print the tweet that has the maximum number of retweets
df['tw_retweets'].max()

42

In [19]:
#Print the tweet that has the minimum number of retweets
df['tw_retweets'].min()

0

In [20]:
#Print the tweet that has the mean value of retweets
df['tw_retweets'].mean()

2.3407407407407406

In [21]:
#You can also round off the number using the round() function
round(df['tw_retweets'].mean())

2

In [22]:
#Sometimes, it is useful to use the median, e.g., when there is an outlier. This can be done as follows:
df2[:10].median()[0] #We use [0] because the result is a list of two arrays as per documentation.

11.0

### 9) Descriptive statistics
#### When you need some basic statistical information about numeric columns in your dataset, use describe() as shown:

In [23]:
df.describe()

Unnamed: 0,tw_replies,tw_retweets,tw_favorites
count,135.0,135.0,135.0
mean,0.414815,2.340741,4.059259
std,1.373443,6.01759,11.285814
min,0.0,0.0,0.0
25%,0.0,0.0,0.0
50%,0.0,0.0,0.0
75%,0.0,2.0,3.0
max,14.0,42.0,92.0


### 10) Find unique values
#### You sometimes only want to know the unique values that exist in a column. To do so, use unique() as shown:

In [24]:
#Find the number and values of unique Twitter usernames in the full dataset
unique_tweeters=df['tweeter_id'].unique()
print("There are "+str(len(unique_tweeters))+" unique users as shown below:")
for i in range(len(unique_tweeters)):
    print(i," ",unique_tweeters[i])

There are 114 unique users as shown below:
0   GSTICseries
1   11ionArt
2   G2H2_Geneva
3   AnnaRHaskins
4   PensiveTM
5   drcamarin
6   AlanBenstock
7   globalpeacef1
8   egumboslav
9   MalariaPapers
10   CheltColGeog
11   60Macc
12   kiplangatLedama
13   faziarizvi
14   Tarif_Naqsh
15   IfeanyiUddin
16   HallidayInc
17   Gbolarin_
18   Rwanda_Edu
19   larrykipz
20   tjfas
21   chris_m_h
22   DJStrumfels
23   justadded
24   RoyalKenyah
25   Dmarigiri_
26   teddyeugene
27   kunleMyk
28   AdesugbaDipo
29   radiokimilili
30   ChefKeidi
31   nicktubechannel
32   MadiSharma1
33   TheMacAnon
34   GodoStoyke
35   Asamoh_
36   ForeverEritrea
37   RaimeMarina
38   mfascinari
39   Future_Cities
40   MakaraNuel
41   Underground_RT
42   ClimateWed
43   Qwame_Dankwah
44   adex0057
45   BankWindhoek
46   MSTCDC
47   WORBLO
48   Ody_johnson
49   USF_Economics
50   Karovoni
51   thebardogbamola
52   MouriceAgumba
53   WaleMicaiah
54   SaleemAmusa
55   UgochiJaneTrump
56   realjohnafolabi
57   onayink

### 11) Group by (equivalent of pivot tables in MS Excel)
#### When you wish to group variables based on a particular value to do other calculations such as find the count, sum, etc., then pandas offers a handy function called groupby(), which can be used as shown:

In [25]:
#Get the total number of tweets that were sent by each user:
df.groupby(['tweeter_id']).agg({'tw_text': ['count']})

Unnamed: 0_level_0,tw_text
Unnamed: 0_level_1,count
tweeter_id,Unnamed: 1_level_2
11ionArt,1
60Macc,1
AdesugbaDipo,1
AlanBenstock,1
AnnaRHaskins,1
Asamoh_,1
BankWindhoek,1
BethuTongai,1
BrookingsGlobal,2
Change_Maldives,1


In [26]:
#You can also do the same using sum, for example to sum up the total number of retweets for each user:
df.groupby(['tweeter_id']).agg({'tw_retweets': ['sum']})

Unnamed: 0_level_0,tw_retweets
Unnamed: 0_level_1,sum
tweeter_id,Unnamed: 1_level_2
11ionArt,1
60Macc,0
AdesugbaDipo,1
AlanBenstock,0
AnnaRHaskins,2
Asamoh_,1
BankWindhoek,0
BethuTongai,0
BrookingsGlobal,3
Change_Maldives,0


In [27]:
#If you have multiple values to aggregate with, you can simply add them to the agg() function as shown:
df.groupby(['tweeter_id']).agg({'tw_text':['count'],'tw_retweets': ['sum'],'tw_replies':['sum'],'tw_favorites':['sum']})

Unnamed: 0_level_0,tw_text,tw_retweets,tw_replies,tw_favorites
Unnamed: 0_level_1,count,sum,sum,sum
tweeter_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
11ionArt,1,1,0,0
60Macc,1,0,0,0
AdesugbaDipo,1,1,0,0
AlanBenstock,1,0,0,0
AnnaRHaskins,1,2,3,92
Asamoh_,1,1,1,2
BankWindhoek,1,0,0,1
BethuTongai,1,0,1,0
BrookingsGlobal,2,3,0,3
Change_Maldives,1,0,1,0


### 11) Sort a groupped data frame
#### Once a grouped data frame is produced, you can run it through the sort function explained earlier as shown:

In [28]:
#This is more complex as it requires passing a custom function to apply and then sorting the values.
def my_agg(x):
    names = { #create a new pandas series based on the aggregated data below
        'Tweets': x['tw_text'].count(), #count the number of tweets and name the column 'Tweets'
        'Replies':  x['tw_replies'].sum(), #add up the number of replies and name the column 'Replies'
        'Retweets': x['tw_retweets'].sum(), #add up the number of retweets and name the column 'Retweets'
        'Favorites': x['tw_favorites'].sum()} #add up the number of favorites and name the column 'Favorites'
    return pd.Series(names, index=['Tweets', 'Replies', 'Retweets','Favorites']) #ensure that the index is updated

grouped_df=df.groupby('tweeter_id').apply(my_agg)

#Let's see if the group is properly created and structured
grouped_df.head() 

Unnamed: 0_level_0,Tweets,Replies,Retweets,Favorites
tweeter_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
11ionArt,1,0,1,0
60Macc,1,0,0,0
AdesugbaDipo,1,0,1,0
AlanBenstock,1,0,0,0
AnnaRHaskins,1,3,2,92


In [29]:
#The next step is simply to sort the newly generated data frame as shown earlier
#Here, we sort in descending order by the number of favorites, then tweets, replies and retweets

grouped_df.sort_values(by=['Favorites','Tweets','Replies','Retweets'], ascending=[False,False,False,False])


Unnamed: 0_level_0,Tweets,Replies,Retweets,Favorites
tweeter_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AnnaRHaskins,1,3,2,92
woye1,1,5,42,56
Underground_RT,1,0,34,54
MaxCRoser,1,2,28,36
iameneji,1,2,6,35
atabinore,1,0,9,18
adex0057,1,0,10,17
SDGsNGA,2,3,13,14
ForeverEritrea,1,1,7,13
MSTCDC,1,0,11,13


### 12) Filtering data in a dataset
#### Occasionally, you would like to only select rows that meet a particular condition (similar to the MS Excel filter method)
#### This is easily done in pandas using the query() method as shown:

In [30]:
#Filter out all those tweets that do not have any replies and show the rest
df.query('tw_replies > 0').reset_index(drop=True)

Unnamed: 0,tweeter_id,avatar_url,tw_time,tw_text,tw_replies,tw_retweets,tw_favorites
0,G2H2_Geneva,https://pbs.twimg.com/profile_images/778233965...,6:08 AM - 28 Nov 2018,Global health disruptors: Millennium developme...,1,2,6
1,AnnaRHaskins,https://pbs.twimg.com/profile_images/941757829...,1:35 AM - 28 Nov 2018,In class today I referenced the Millennium Dev...,3,2,92
2,faziarizvi,https://pbs.twimg.com/profile_images/537347315...,5:04 PM - 26 Nov 2018,"""One of the United Nations’ Millennium Develop...",1,0,0
3,Gbolarin_,https://pbs.twimg.com/profile_images/105372443...,7:35 PM - 24 Nov 2018,3\nThe SDGs are an offshoot of the now outdate...,1,2,0
4,Rwanda_Edu,https://pbs.twimg.com/profile_images/919159851...,7:32 PM - 23 Nov 2018,.@MinisterRw_Edu :One of the key strategic pri...,2,8,7
5,tjfas,https://pbs.twimg.com/profile_images/990293406...,5:30 PM - 22 Nov 2018,"Politics apart my people, how is it possible u...",1,0,0
6,chris_m_h,https://pbs.twimg.com/profile_images/956196343...,4:18 PM - 22 Nov 2018,"The Strategic Development Goals are audacious,...",1,4,8
7,RoyalKenyah,https://pbs.twimg.com/profile_images/106073457...,7:29 AM - 22 Nov 2018,The 17 Sustainable development goals build on ...,1,9,9
8,kunleMyk,https://pbs.twimg.com/profile_images/103889242...,6:32 AM - 22 Nov 2018,"How well can you pronounce ""Millennium Develop...",14,28,5
9,nicktubechannel,https://pbs.twimg.com/profile_images/105331498...,3:27 PM - 21 Nov 2018,These 17 Goals build on the successes of the M...,1,1,3


In [31]:
#Filter out all those tweets that do not have any replies and show the rest
df.query('tw_replies > 0').reset_index(drop=True)

Unnamed: 0,tweeter_id,avatar_url,tw_time,tw_text,tw_replies,tw_retweets,tw_favorites
0,G2H2_Geneva,https://pbs.twimg.com/profile_images/778233965...,6:08 AM - 28 Nov 2018,Global health disruptors: Millennium developme...,1,2,6
1,AnnaRHaskins,https://pbs.twimg.com/profile_images/941757829...,1:35 AM - 28 Nov 2018,In class today I referenced the Millennium Dev...,3,2,92
2,faziarizvi,https://pbs.twimg.com/profile_images/537347315...,5:04 PM - 26 Nov 2018,"""One of the United Nations’ Millennium Develop...",1,0,0
3,Gbolarin_,https://pbs.twimg.com/profile_images/105372443...,7:35 PM - 24 Nov 2018,3\nThe SDGs are an offshoot of the now outdate...,1,2,0
4,Rwanda_Edu,https://pbs.twimg.com/profile_images/919159851...,7:32 PM - 23 Nov 2018,.@MinisterRw_Edu :One of the key strategic pri...,2,8,7
5,tjfas,https://pbs.twimg.com/profile_images/990293406...,5:30 PM - 22 Nov 2018,"Politics apart my people, how is it possible u...",1,0,0
6,chris_m_h,https://pbs.twimg.com/profile_images/956196343...,4:18 PM - 22 Nov 2018,"The Strategic Development Goals are audacious,...",1,4,8
7,RoyalKenyah,https://pbs.twimg.com/profile_images/106073457...,7:29 AM - 22 Nov 2018,The 17 Sustainable development goals build on ...,1,9,9
8,kunleMyk,https://pbs.twimg.com/profile_images/103889242...,6:32 AM - 22 Nov 2018,"How well can you pronounce ""Millennium Develop...",14,28,5
9,nicktubechannel,https://pbs.twimg.com/profile_images/105331498...,3:27 PM - 21 Nov 2018,These 17 Goals build on the successes of the M...,1,1,3


In [32]:
#You can also use conditional statements that have boolean operators such as when filtering those that have retweets or favorites more than or equal to 10
df.query('tw_retweets >=10 or tw_favorites>=10').reset_index(drop=True)

Unnamed: 0,tweeter_id,avatar_url,tw_time,tw_text,tw_replies,tw_retweets,tw_favorites
0,AnnaRHaskins,https://pbs.twimg.com/profile_images/941757829...,1:35 AM - 28 Nov 2018,In class today I referenced the Millennium Dev...,3,2,92
1,kunleMyk,https://pbs.twimg.com/profile_images/103889242...,6:32 AM - 22 Nov 2018,"How well can you pronounce ""Millennium Develop...",14,28,5
2,MadiSharma1,https://pbs.twimg.com/profile_images/610387093...,11:40 AM - 21 Nov 2018,BANGLADESH - EXCEEDING EXPECTATIONS - Happy to...,0,2,13
3,ForeverEritrea,https://pbs.twimg.com/profile_images/102607938...,2:52 AM - 21 Nov 2018,"""SHEER WILLPOWER!\nWe were the ONLY Sub-Sahara...",1,7,13
4,Underground_RT,https://pbs.twimg.com/profile_images/104788960...,1:05 PM - 20 Nov 2018,"""SHEER WILLPOWER! We were the ONLY Sub-Saharan...",0,34,54
5,adex0057,https://pbs.twimg.com/profile_images/105539528...,9:21 AM - 20 Nov 2018,Atiku campaigns told Nigeria is developmental ...,0,10,17
6,MSTCDC,https://pbs.twimg.com/profile_images/953607386...,5:25 AM - 20 Nov 2018,"""There is a huge different between sustainable...",0,11,13
7,woye1,https://pbs.twimg.com/profile_images/100845526...,11:13 AM - 19 Nov 2018,What is Millennium Development Goals and what ...,5,42,56
8,ayo_david,https://pbs.twimg.com/profile_images/100249482...,10:44 AM - 19 Nov 2018,Misleading. It's talking about Human developme...,0,3,10
9,MaxCRoser,https://pbs.twimg.com/profile_images/991373737...,1:21 PM - 6 Nov 2018,Did the world actually achieve the Millennium ...,2,28,36


### 13) Writing DataFrames to CSV files
#### To write dataframes to CSV files, you can always use the to_csv() function as shown below:

In [33]:
#Let's write the contents of df to 'output.csv'
df.to_csv('output.csv')

#And now let's read it (but let's drop the index) and confirm that all is in order
pd.read_csv('output.csv').head()


Unnamed: 0.1,Unnamed: 0,tweeter_id,avatar_url,tw_time,tw_text,tw_replies,tw_retweets,tw_favorites
0,0,GSTICseries,https://pbs.twimg.com/profile_images/875623570...,1:25 PM - 28 Nov 2018,All fueled up? Afternoon session (Waste)#Water...,0,1,2
1,1,11ionArt,https://pbs.twimg.com/profile_images/106785384...,11:20 AM - 28 Nov 2018,DID YOU KNOW? 5 Ongoing Trends that Will Resha...,0,1,0
2,2,G2H2_Geneva,https://pbs.twimg.com/profile_images/778233965...,6:08 AM - 28 Nov 2018,Global health disruptors: Millennium developme...,1,2,6
3,3,AnnaRHaskins,https://pbs.twimg.com/profile_images/941757829...,1:35 AM - 28 Nov 2018,In class today I referenced the Millennium Dev...,3,2,92
4,4,PensiveTM,https://pbs.twimg.com/profile_images/104786562...,6:08 PM - 27 Nov 2018,The true extent of global poverty and hunger: ...,0,1,1


### Great!
#### We have now learned the most important pandas functions that allow you to do some basic data analysis.
#### To learn more, check the documentation [here](https://pandas.pydata.org/pandas-docs/stable/)

# Exercise:

### **Task:** 

Write a script that reads the CSV you created last workshop to create 2 CSV files as per the following:
- CSV file 1 should contain the list of tweets sorted by number of retweets and then by number of favorites (top retweets first). 
- CSV file 2 should contain the list of tweeters (users) and number of tweets they each published, the number of total retweets and favorites they each got. The data in the file should be sorted by the total number of favorites, then retweets, then replies (highest first). 

Upload the script and the two CSV files to the desgintated inlämning link on Studiewebben.