# 1.0 Uploading files from your local file system


In [0]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

# 2.0 Modules


## 2.1 Introduction

In 2006, Daniel Ek and Martin Lorenson changed the music industry. In an industry laden with illegal downloading, Ek and Lorenson set out to create a service that reduce piracy and also generate profits for labels and musicians. That service, is called [Spotify](https://www.spotify.com/us/).

Spotify is a subscription-based music streaming service that gives users access to millions of songs and content from artists around the world. Users around the world pay a monthly subscription fee and have unlimited access to the music on Spotify. As of January 2018, Spotify has over 70 million paying users. For the unfamiliar, here's what the application interface looks like:

<center>
<img width="600" src="https://drive.google.com/uc?export=view&id=1eb6ygchrBFQ5FFKRRh7G-IlhomXN0nSX">
</center>

Since Spotify has been growing in popularity, artists have started using the platform more frequently to promote their music. An artist who has made the top 100 list on Spotify has likely reached the upper echelon's of music success. In this mission, to better understand the qualities of successful tracks on spotify, we'll be answering two questions:

- **What are the average total streams for each song in the top 100?**
- **Which song was the most popular song of 2017?**


To answer this question, we'll use the [Spotify's WorldWide Daily Song Ranking](https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking), to analyze the popular songs of 2017-2018:


| |Track Name | Artist | Position | Streams |  |
|------------|--------------------------|---------------|---------|------------|
| 77 | Sign of the Times | Harry Styles | 756325 | 503894417 |
| 92 | Photograph | Ed Sheeran | 1525708 | 441132246 |
| 70 | Look What You Made Me Do | Taylor Swift | 335837 | 562562226 |
| 36 | Scared to Be Lonely | Martin Garrix | 1074560 | 866104216 |
| 13 | Attention | Charlie Puth | 560536 | 1112777364 |


Before we read in the full data, let's start by writing a function that calculates the average value in the streams column for just these 5 songs.


**Exercise**

<img width="150" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

1. Write a function called **average()** that:
  - Takes in a list as an argument.
  - Returns the average of the list.
2. Pass **top5_streams** into **average()** and assign the results in **total_average**.


In [0]:
# Stream column for top 5 songs only
top5_streams = [2993988783, 1829621841, 1460802540, 1386258295, 1311243745]

# Put your code here

## 2.2 Introduction to Modules

In the previous cell, we wrote our own implementation that found the average of a list. Rather than writing our own implementation every time, luckily, someone else has likely already written an implementation as a **module**.

A module is a collection of functions and variables that have been bundled together in a single file. This single file, is generally centered around a specific theme. There are modules that focus on [math](https://docs.python.org/3/library/math.html) operations, audio files ([audioop](https://docs.python.org/3/library/audioop.html)), image processing ([pillow](https://pillow.readthedocs.io/en/latest/)) and many more.

Modules improve the readability of our code by abstracting away the implementation while allowing us to understand exactly what the code does. Let's take an example of **sum()**:

```python
l = [66,44,22]
sum(l)
```

Here, we can't see the code underneath the **sum()**. However, we know exactly what the code does in one line.

To load a module, we'll use the **import** statement. For readability, it's usually a good idea to import the modules we'll need in the beginning of our script.


When importing a module, we then get access to all the functions and variables within the module. Let's look at a sample implementation of two functions: **sum()** and **exp()**.

```python
def total(input):
    total = 0
    for num in input:
        total += num
    return total

def exp(input):
    return 2.718281**input
```

Rather than write our own implementation, we could use the functions in just two lines of code. Whenever we use a module, we don't need to know how the code is implemented to use it:

<img width="600" src="https://drive.google.com/uc?export=view&id=1wcsD7YADFPoqEAhjT2UiiI8c-rfz5Df7">

To use a specific function within our math module, we'll use the dot notation followed by the name of the function:

```python
module.function()
```

Returning to our earlier example **sum()** is a built-in function:

```python
l = [66,44,22]
sum(l)
```

There's no need to call a specific module to access the function. Here are the list of all the [built-ins](https://docs.python.org/3/library/functions.html).

Popular Python modules have documentation describing the names of the functions and variables we could use within the module. Generally, whenever you use a module, it's good practice to reference the documentation. In this mission, we'll be using the [statistics](https://docs.python.org/3/library/statistics.html) and [math](https://docs.python.org/3/library/math.html) modules. Well documented modules allows us to re-use the implementation without needing to understand the code ourselves.

As a review of everything covered on this section: 
- Use **import** to load a module. Load modules in the beginning of a script. 
- To use a function within a module, remember **module.function()**. 
- When using a [built-ins](https://docs.python.org/3/library/functions.html) function, you do not need to add the module name in front the function.




**Exercise**

<img width="150" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


1. Import the **statistics** module.
  - Within the **statistics** module, use the function **mean()** to calculate the mean of **top5_streams**. 
  - Store the result in **average**

In [0]:
!pip install statistics

In [9]:
top5_streams = [2993988783, 1829621841, 1460802540, 1386258295, 1311243745]

# put your code here
import statistics as stat

stat.mean(top5_streams)

1796383040.8

## 2.3 Loading our data using the CSV module

In the previous two sections, we calculated the average of the top 5 songs. Although this gives us useful information, we'd like incorporate more data to better gauge the average streams for the top songs on Spotify.

To incorporate more data, we'll need to load a CSV file containing the top 100 songs on Spotify. In the previous course, we learned how to work with CSV files by:

- Opening a file
- Reading the contents of that file into a string
- Splitting the string on the newline character
- Splitting each line on the comma character

Now that we understand how modules work, there is a **csv** module. This module has a function called **reader()** which takes a file object as an argument and returns an object that represents our data. We'll cover objects later in this course, but for now, we'll convert the object to a list and use the result.

To read data from a file called **"sample.csv"**, we first import the csv module:

```python
import csv
```

Next, we open the file:

```python
f = open("sample.csv", "r")
```

"r" stands for read-only mode. Then, we call the module's **reader()** function:

```
csvreader = csv.reader(f)
```

Finally, we convert the result to a list:

```
my_data = list(csvreader)
```

**list()** is a built-in function in Python. Built-in functions are functions built into the Python language. These functions are available at anytime, without needing to load a module. Here are the list of all the [built-ins].

Let's load **"top100.csv"** into Python, a dataset containing information on the top 100 songs from 2017.

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Read **"top100.csv"** into a list variable named **music** using the **csv** module.



In [14]:
import csv

music = list(csv.reader(open("top100.csv","r")))
music[:4]

[['Track Name', 'Artist', 'Position', 'Streams'],
 ['Shape of You', 'Ed Sheeran', '301513', '2993988783'],
 ['Despacito - Remix', 'Luis Fonsi', '477232', '1829621841'],
 ['Despacito (Featuring Daddy Yankee)', 'Luis Fonsi', '816152', '1460802540']]

## 2.4. Understanding the namespace

When we use a statement like **import statistics**, we're importing all the variables and functions from statistics into our namespace. A **namespace** is a dictionary that contains all the names of the variables and functions we can refer to in our code.

When we load a module, we're loading all it's associated functions and variables into the namespace. When we create new variables or write new functions, we're adding the variables and functions into our namespace.

Let's take the following variable assignment:

```python
a = 10
```

When assigning 10 to the variable **a**, we are storing this object into our namespace. We can use a at any point later in the script. When we import a module like **import math**, we are loading every function and variable within that module into our namespace.:

<img width="500" src="https://drive.google.com/uc?export=view&id=1TcwuFS82VEU6cp4EKkyk8hhuVbcMTCcB">


Since **print()** is a **built-in** function, the interpreter automatically stores **print()** into our namespace. As a result, we have access to **print()** anywhere in our script.

To see all the variables & functions in the namespace, run the **dir()** function by itself:

```python
dir()
```

Running **dir()** by itself will produce a list of values that may look like this:

```python
['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__name__', '__package__', '_dh', '_i', '_i1', '_ih', '_ii', '_iii', '_oh', '_sh', 'exit', 'get_ipython', 'quit']
```

For now, we won't worry about what each of these mean. These are all the valid names within our current workspace. However, we can also use **dir()** to list the valid names for a specific variable or module in our workspace. Let's use the **dir()** function on math:

```python
import math 

dir(math)
```

This returns:

```python
['__doc__',
 '__file__',
 '__name__',
 '__package__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',

 ........
 ]
```

**acos()** is a function we can use since we loaded everything in the namespace. Now, let's check what functions we have access to when we load the statistics module!

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Import **statistics**.
2. Print **dir()** with no object to display all the attributes in the namespace.
3. Print all the attributes related to statistics using **dir()**.

In [0]:
# install statistics module
#!pip install statistics

# put your code here
dir('_i9')

## 2.5 Cleaning Our Data

So far, we've learned how to load modules. Returning to our original questions:

- What are the average total streams for each song in the top 100?
- Which song was the most popular song of 2017?

To answer these questions, we'll first, need to clean up our current dataset. We'll do this by extracting the names of the tracks and the number of streams out of our current dataset.


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


1. Extract the name of the track and the number of streams out of the dataset:
  - Create two empty lists named **stream_numbers** and **track_names**.
  - Loop through each song in **music**, skipping the first row of column headers.
  - Extract the name of each song and append to **track_names**.
  - Extract the number of streams for each song, convert the value to an integer, and append to **stream_numbers.**



In [19]:
import csv
f = open("top100.csv","r")
music = list(csv.reader(f))

# put your code here
stream_numbers = []
track_names = []

for stream in music[1:]:
  track_names.append(stream[0])
  stream_numbers.append(int(stream[-1]))

print(track_names[:3])
print(stream_numbers[:3])

['Shape of You', 'Despacito - Remix', 'Despacito (Featuring Daddy Yankee)']
[2993988783, 1829621841, 1460802540]


## 2.6 Writing Modular Code

We've seen how to access modules written by other people. If we flip this, what if we wanted to write a module for someone else to use? How would we structure our code?

Let's look at the Python code to find the sum of a list and length of a list in non-function form:


<img width="200" src="https://drive.google.com/uc?export=view&id=1escZv9vdN6dUTl3QuJqjyjeCZxzuIMN-">

If we wanted to re-use this code, a simple method would be to copy/paste the code. However, let's say we're working at a company where 1000 people would like to use our code. To make our code re-useable, we could transform our code into **functions**:


<img width="400" src="https://drive.google.com/uc?export=view&id=1HE5sL1C5MLFYVjuzWrVBL3-wd_1r6Lpm">


Transforming our code into functions allows us to hide our logic. Others who use our functions, do not need to understand how it was implemented. We've used functions like **print()** without knowing how the underlying code works.

To use functions from the script, they can import the module:

<img width="600" src="https://drive.google.com/uc?export=view&id=1kMm-T1Ps1mmwYIseBzuuY5tsiqg6sWHu">


Keep in mind, that the red is not a part of the script. The red is there to show you how we're organizing the code. This process of breaking our code into re-useable components is called modular programming. There are two common strategies to **modular programming**:

- Transforming our code into functions
- Using object-oriented programming (we'll go into this later in the course).

Modular programming enables programmers to divide up their code and debug pieces of code independently. To write effective modular code, there are **3 R's** to writing clean code:

- **Readability**: How can I write read-able code?
- **Re-usability**: How can I write re-usable code?
- **Reliability**: How can I write reliable code?

Let's get started!


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Using the spaghetti code, modularize the code into the following set of functions:
  - **read_data()**: this function reads the CSV file and converts the file object into a list.
  - **get_data()**: this function extracts the stream numbers and track name from the list.

After modularizing the code, print **dir()** to check that your functions are stored in the namespace.

In [0]:
import csv


def read_data(filename):
  return list(csv.reader(open(filename,"r")))

def get_data(lists):
  stream_numbers = []
  track_names = [] 

  for song in music[1:]:
    stream_numbers.append(int(song[3]))
    track_names.append(song[0])

  return stream_numbers, track_names

music = read_data("top100.csv")
stream, names = get_data(music)



In [22]:
music[:3]

[['Track Name', 'Artist', 'Position', 'Streams'],
 ['Shape of You', 'Ed Sheeran', '301513', '2993988783'],
 ['Despacito - Remix', 'Luis Fonsi', '477232', '1829621841']]

In [23]:
stream[:3]

[2993988783, 1829621841, 1460802540]

In [0]:
dir()

## 2.7 Local and Global Variables

Now that we've written modular code, let's return to our sample code and examine a few components of this code:

<img width="250" src="https://drive.google.com/uc?export=view&id=1eK_WvkX7qqsvJR4QkxL9zNYeDx-kMlGZ">


Here, we've defined two variables of **int** type and one variable of **list** type. Notice the positioning of the total and count variables compared to the **l** list.


We might ask:

- Does the location where we define our variable matter?

Let's examine the total variable. We've defined total within the **sum()** function. Within our code, we can access the total variable within the local namespace:

<img width="250" src="https://drive.google.com/uc?export=view&id=1vKWv-fUFCkXVOx0moUPIEhMO778D2LKP">

However, if we try to access this variable outside of the accessible area, we'll return an error:

<img width="250" src="https://drive.google.com/uc?export=view&id=1PVaFYV6DFcGodG9HO3D3YBn0JCFoig3Q">


Variables defined within the **sum()** function are called local variables. Local variables can't be accessed outside the function.

On the other hand, we've defined **l** as a list of numbers. We've defined this outside the function, which means, this list is accessible throughout the entire script. This list is accessible in the **global namespace**. We can access the values both inside and outside the function:

<img width="250" src="https://drive.google.com/uc?export=view&id=1baTdEg7RcgHlhgOtRsr5HHmhroLY87LE">

This is called a **global variable**. For now, we'll lean towards using more local variables than global variables. Tracking and reading programs using mainly local variables is slightly easier. However, there will be many cases where we'll need to re-use an object, which will require a global variable.



**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


- Re-write the **read_data()** function, so that the filename is a global variable containing **"top100.csv"**.
- Call **read_data()** and store this in the variable **f**.

In [0]:
def read_data(filename):
    f = open(filename)
    return list(csv.reader(f))
  
# put your code here

## 2.8 Using Programming Paradigms

In this section, we've introduced a paradigm of programming called **modular programming**. In programming, there are many [types of paradigms](https://en.wikipedia.org/wiki/Comparison_of_programming_paradigms). Each paradigm has it's own strengths and weaknesses.

Modular programming is a broad, over-arching paradigm. Within modular programming, we have many different paradigms. Within this Python Intermediate course, we'll introduce another two of these paradigms: **functional programming** and **object oriented programming**. In this mission, we'll get a taste of functional programming. Functional programming is common in data science. In this section, you'll get a lot of exposure to functional programming. 

In this step, we also used the functional programming paradigm. In functional programming, we decompose problems into a set of functions. Each function operates on the input and produces an output.

<img width="600" src="https://drive.google.com/uc?export=view&id=1taQ0Fe5gnuKNhcAhRp83VxHJaYAxX3TX">

A simple pipeline written in functions might look like:

```python
import math 

x = 7
def exp(x):
    return math.exp(x)

def fraction(x):
    return 1/x

x = exp(x)
x = fraction(x)
```

This would return:

```python
0.000911881965555
```

Let's transform our previous functions into a mini-pipeline!


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

Using the current functions, create a mini-pipeline in the following order:
  - Read the data from **"top100.csv"** using **read_data()**. Store this in music.
  - Take the result, and pass this into **get_data()** to extract the correct format. Store this in **stream_numbers**, **track_names**.
  - Pass **stream_numbers** into **average()** and store the result in average.

In [0]:
def read_data(filename):
    f = open(filename)
    return list(csv.reader(f))


def get_data(data):
    list1 = []
    list2 = []
    for x in data[1:]:
        list1.append(int(x[3]))
        list2.append(x[0])
    return list1, list2

def ceil(data):
    ceiling = 0
    for x in data:
        if x > ceiling:
            ceiling = x
        else:
            ceiling
    return ceiling

def average(data):
    total = 0
    for x in data:
        total += x
    return total/len(data)
  
# put your code here

## 2.9 Importing using an Alias

In the second screen of this section, we learned how modules work and how to import a module:

<img width="600" src="https://drive.google.com/uc?export=view&id=1BjwfYWIeJ-Ke2rNfr1Fy6UcpaAd5hKPp">

However, sometimes, modules have long names. This means, we have to use the full module name everytime we use any of it's objects. Instead, we can give the module name an alias:

```python
import module_name as m
```

As a result, we can access the functions using the module name:

```python
m.function1()
m.function2()
```


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


1. Import the **statistics** module as **s**.
  - Within the **statistics** module, there is a function called **stdev()** which finds the standard deviation of a group of numbers. Find the standard deviation of **stream_numbers** and store the result in **variation**.
  - The standard deviation helps us quantify the amount of deviation in the group. We'll dive deeper into statistical analysis later in this track.

In [0]:
# put your code here

## 2.10 Importing Specific Objects

If we're using only a few functions from a module, importing all the functions may not be an efficient use of our computer memory.

We can specify which functions we'd like to use in our import:

```python
from module import function_1, function_2
```

After importing these functions, we won't need to include the module name when calling the function:

```python
function_1()
function_2()
```

Generally, if we know what functions we want to use, it's better practice to import specific function names.

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. From the **statistics** module, import the [mean](https://docs.python.org/3/library/statistics.html#statistics.mean), [stdev](https://docs.python.org/3/library/statistics.html#statistics.stdev) and [median](https://docs.python.org/3/library/statistics.html#statistics.median) functions to the global namespace.
  - Call **mean()** on **stream_numbers** and store this in **average**.
  - Call **stdev()** on **stream_numbers** and store this in **variation**.
  - Call **median()** on **stream_numbers** and store this in **med**.
2. Examining these three values should give us a good feel for what the distribution looks like.

## 2.11 Next steps

In this section, we've learned:

- The 3 R's of Modular Programming: Readability, Re-usability, Reliability.
- How to use modules
- The basics of functional programming
- Local and Global Variable Scopes
- Different ways of importing modules

Now that we've gotten a holistic understanding of programming paradigms and good programming habits, let's dive into additional tools to add to your arsenal

# 3.0 Iterations and List Comprehensions

In the previous lessons we learned how to iterate over multiple values using a **for** loop. To review, let's look at a **for** loop in action:

```python
streams = [57,62,63,99,142]
average = 84

diff = []
for num in streams:
    diff.append(num - average)
```

Assuming the average number of music streams is **84**, we wrote three lines of code to compare the number of music streams against the average:

```python
diff = []
for num in streams:
    diff.append(num - average)
```

In this section, we'll show you how re-write this expression in **one line of code**.

We'll be using the same Spotify worldwide ranking dataset. Throughout this mission, we'll attempt to answer one question:

<img width="400" src="https://drive.google.com/uc?export=view&id=1c3KmRv2N3KqgGk-i1K60AYGw0gBsacfg">

In our quest to find the dominant artist of 2017, we'll be using the same [Spotify's WorldWide Daily Song Ranking dataset](https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking). Using this dataset, we'll learn:

- How to transform a list of strings into a dictionary of counts.
- How to transforms a three line for loop into one, beautiful line of code.
- How to write functions quickly and succintly.
- How to bypass an error within our code.

Let's get started!

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Use the **csv** module to read **"top100.csv"** into a list and assign to **music**.
2. Preview the first few rows of the dataset to get a feel for what the column names are and what the data looks like.
3. To find the most dominant artist of 2017, we'll need to extract the artist name:
  - Create a new list called **artists** and extract the artist name from our dataset.
  - Loop through **music** and append the artist name to **artists**.

In [27]:
# put your code here
import csv

music = list(csv.reader(open("top100.csv","r")))[1:]
artists = [row[1] for row in music]
artists[:3]

['Ed Sheeran', 'Luis Fonsi', 'Luis Fonsi']

## 3.2 Extract the Artists Using a List Comprehension

In the previous sction, we wrote 3 lines of code to extract the artist names from our **music** dataset:

```python
artists = []
for row in music[1:]:
    artists.append(row[1])
```

However, we can re-write this for loop in one line of code using a **list comprehension**. A list comprehension is a concise way of creating lists.

Let's take a look at an example of a **for** loop that calculates the difference between the values and the average:

```python
streams = [57,62,63,99,142]
average = 84

diff = []
for num in streams:
    diff.append(num - average)
```

We've created a new list called **diff**. Now, if we wanted to write the equivalent in a list comprehension:

```python
diff = [(num-average) for num in streams]
```

Let's see the different components of a for loop converted into a list comprehension. Let's start by looping through our list:

<img width="500" src="https://drive.google.com/uc?export=view&id=1oRiLRrSDFbqVjMqiKoez9veCzbYwrdz1">


**num** in this case, is called an **iterable**. Whenever we loop through any data structure, Python will automatically look through each value in our data structure and return each individual value. This returned value is called an iterable. Read more about this concept [here](https://docs.python.org/3/tutorial/classes.html#iterators).

Now, let's define what we'd like to transform our iterable variables into:


<img width="500" src="https://drive.google.com/uc?export=view&id=1ZXJAfIhgb3AxyIrWa2Y2bVs9Y6EV3p4O">

Now, let's add the append() method to our list to create the new list:

<img width="500" src="https://drive.google.com/uc?export=view&id=1iJW9nHgeBFSVWlkLv1Q_jWpvQ_hom7SF">

Now that we understand list comprehensions, let's rewrite the code from the last screen's exercise as a list comprehension.

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Convert the previous **for** loop into a list comprehension.
2. Store this list in **artists_lc.**

In [0]:
# put your code here

## 3.3 Getting the Artist Count Using a Function

We've extracted the artist names from **music** into a separate list named artists. Our next step, is to find the number of times the artist name appears in our dataset. Here's what the first few 5 rows of **music** look like:

```python
['Ed Sheeran',
 'Luis Fonsi',
 'Luis Fonsi',
 'The Chainsmokers',
 'Kendrick Lamar',
 ]
```

We'll first write our own function for counting. In the next screen, we'll make this calculation using a pre-existing module.

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Write a function called **counter()**. The function do the following:
  - Accept a list of artists as an argument.
  - Build a dictionary with the unique counts for each artist:
    - The key should be the artist name
    - The value should be the associated count for that artist
  - Return a dictionary with the artist name as the key and the count of the artist as the value.
2. Pass in **artists** to the **counter()** function and store the returned result in **counts**.

In [0]:
# put your code here

## 3.4 Getting the Artist Count Using Collections

In the previous section, we wrote our own **counter()** function to practice creating our own functions. In most real-wrold scenarios, it makes more sense to use a module built into the Python language.

So far, we've used lists and dictionaries to solve specific problems. Lists and dictionaries are **data structures** that organize in specific ways. In a list, the data is organized by an incrementing integer index (**0** to **n-1**). In a dictionary, the data is organized by arbitrary keys that we can specify. 

The [collections](https://docs.python.org/3.3/library/collections.html) module contains a Counter object that we can use to replicate the same functionality. We can use the Counter object to calculate the number of occurrences for each value within a a data structure.

The returned Counter object behaves very similar to a dictionary, but contains other useful methods. To use **Counter(**, just pass in any iterable object to the object's constructor. Here's an example where we pass in a string value:

```python
Counter("hello")
```

Running this code returns a **Counter** object:

```python
Counter({'e': 1, 'h': 1, 'l': 2, 'o': 1})
```

You'll notice that the object doesn't preserve any specific order (either in keys or in values). Now, let's pass in a list:

```python
l = ["a","a","a","b"]
```

This will return:

```python
Counter({'a': 3, 'b': 1})
```

Let's start by using the **Counter()** function to create a **Counter** object representing all of the artist names.

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


1. From the **collections** library, import the function **Counter()**.
2. Create a **Counter** object from the values in **artists** and assign the result to **artist_counts**.

In [0]:
# put your code here
from collections import Counter

Counter(artists)

## 3.5 Looping Through Counts Using Items()

To extract the top value, we'll first convert our dictionary into a list of lists. To make this conversion, we'll use the **dict.items()** method to convert our dictionary into a list of tuples.

A **method** is a function specific to an object. We'll be diving deeper into creating methods later on, when we learn about object-oriented programming. The main difference between a method and function is the way they are used:

<img width="600" src="https://drive.google.com/uc?export=view&id=1gUxSkw53-XhPuQd7q-dI680pUtChTayM">


We call a function by it's name, add an argument and return data from the function. Data passed to a function is explicitly passed. Explicit passing, means that we identify the argument that we're applying to within our function: **sum(artist_list)**. Here, **artist_list**, is explicitly passed.

A method behaves like a function, it is associated specifically with the object. The main difference, is that the data structure is implicitly passed. This means, we do not need to explicitly specify the object within our method: **list.append(1)**. **list.append()**, will pass list through **append()** automatically. In this case, we'll only need to explicitly pass the value we want to append: **1**.

We'll have a better understanding of methods when we write our own classes later in this course.

The **dict.items()** method will convert the key, value pairs in a dictionary to key,value pairs in a list of tuples:

```python
dictionary = ({key:value, key:value})

resulting_structure = [(key, value), (key, value)]
```

Let's look at an example:

```python
sample = Counter({'21 Savage': 1, 'Alessia Cara': 1})
```

Then, we'll use call the **items()** method on sample:

```python
sample.items()
```

This would return:

```python
[('21 Savage', 1), ('Alessia Cara', 1)]
```

Then, we can loop through each tuple in this list, and append it to a new list to create a list of artist names and counts.

```python
sample.items()
sample_list = []
for first_value, second_value in sample.items():
    # Add to list
    sample_list.append([first_value, second_value])
```

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Create an empty list and assign to **artist_counts_list**.
2. Use the **dict.items()** method to transform the counts dictionary into a list of tuples.
3. Write a for loop that iterates over the list of tuples:
  - Create a list from the 2 values in the tuple.
  - Append that list to **artist_counts_list.**
4. Display **artist_counts_list** using the **print()** function.


In [29]:
from collections import Counter
artist_counts = Counter(artists)

# Add your code here
artist_counts_list = [[key, value] for key,value in artist_counts.items()]
artist_counts_list[:3]

[['David Guetta', 1], ['Luis Fonsi', 2], ['Zion & Lennox', 1]]

## 3.6 Using a List Comprehension

In the previous screen, we used a **for** loop to create the new list of lists. Now that we understand the concept of list comprehensions, let's convert our for loop into a list comprehension. To review, here's how a **for** loop converts into a list comprehension:

**Exercise**

<img width="500" src="https://drive.google.com/uc?export=view&id=1ajinlMH6YfSdUAXHIaCmycI2vKXrKiC8">


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


1. Convert the **for** loop from the previous exercise into a list comprehension that assigns the result to **artist_counts_two** instead.
2. Display **artist_counts_two** using the **print()** function.






In [0]:
from collections import Counter
artist_counts = Counter(artists)
artist_counts_list = []
for artist, count in artist_counts.items():
    artist_counts_list.append([artist,count])
    
# put your code here

## 3.7 Sorting A List of Lists


Now, that we have our list of artist names and counts, to find the dominant artist of 2017, we'll need to:

- Sort our list in descending order by number of stream
- Extract the value at the first index.

To sort a list of values, we'll use the **list.sort()** method:

```python
streams = [54,33,76,99,123]
streams.sort()
```

When we call **list.sort()**, we do not need to store this expression in a variable like so: **streams = streams.sort()**. This is because this method modifies the associated list directly instead of returning a new object.

```python
streams = [54,33,76,99,123]
streams.sort()
print(streams)
```

This would return all the values in sorted order:

```python
[33, 54, 76, 99, 123]
```


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


1. Call the **sort()** method on the **artist_counts_list** nested list.
2. Select the first list from **artist_counts_list** and assign to **first_artist**.
  - Is this actually the top artist? Head to the next step to read more.

In [0]:
# put your code here
artist_counts_list.sort()
artist_counts_list

## 3.8 Specifying a Key When Sorting a List of Lists

In the previous screen, **artist_counts_list.sort()** sorted our list of lists in alphabetical order:

```python
[['21 Savage', 1], ['Alessia Cara', 1], ['Avicii', 1], ['Axwell /\\ Ingrosso', 1], ['Big Sean', 1], ['Bruno Mars', 2], ['CNCO', 1], ['Calvin Harris', 2], ['Camila Cabello', 1], ['Cardi B', 1], ['Charlie Puth', 1], ['Cheat Codes', 1], ['Childish Gambino', 1], ['Chris Jeday', 1], ['Clean Bandit', 2], ['DJ Khaled', 2],
.......
 ```
 
By default, if the data type within the list is a string, the python interpreter will automtically sort the list in alphabetical order. Since we were sorting a list of lists, the interpreter will automatically sort the lists by the value in the first index, which was a string.

If the data type is an **int** or **float**, the interpreter will automatically sort the numbers from lowest to highest:

```python
sample = [4,2,5,6,2,5]
sample.sort()
```

This returns:

```python
[2, 2, 4, 5, 5, 6]
```

However, in our scenario, sorting our **counts** list by artist name doesn't tell us the dominant artist of 2017.

Instead of sorting by the list of lists by the values in the 0th index value (artist names), we want to sort by the values at index value 1 (number of top 100 appearances for that artist).

The **key** parameter lets us specify a custom function for sorting. Python will pass each list in the list of lists into this function and use that for sorting. Let's look at a sample list of lists:

```python
sample = [
            [1,2,3,4,5],
            [4,4,5],
            [3,2]
         ]
```

Because **sample** has lists of varying lengths, we may be interested in sorting by the length of these lengths. We can accomplish that by passing in the **len** function to **key**:

```python
sample.sort(key = len)
```

Let's see how **.sort()** will sort a list of lists by the len key:



<img width="600" src="https://drive.google.com/uc?export=view&id=1wFnZGkZceFHURBOq9tXjN0fhWEzFsKdi">


After calculating the length for each value, each value will be sorted:


<img width="600" src="https://drive.google.com/uc?export=view&id=1uyNjNxvCLE-saR1cJZwXiH6VZ2jQzPTt">


Displaying **sample** after it was sorted this way would display:

```python
[[3,2], [4, 4, 5], [1, 2, 3, 4, 5]]
```

Then, if you'd like to sort in descending order, we'll add another parameter:

```python
sample.sort(key = len, reverse = True)
```

This would return:

```python
[[1, 2, 3, 4, 5], [4, 4, 5], [3, 2]]
```

To determine the top artist, we can write a function that just returns each list's value at index 1 (the number of top 100 appearances for that artist).


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">



1. Sort **artist_counts_list** by the number of top 100 appearances by:
  - Using the key parameter and specifying the **by_count()** function.
  - Setting the parameter **reverse** to **True**.
2. Use indexing to select the first item of **artist_counts_list**. Assign the item to **top_artist**



In [0]:
def by_count(artists):
    return artists[1]

# put your code here

## 3.9 Creating An Anonymous Function

In the previous section, we defined the key parameter within the **sort()** method. We learned that the key parameter takes in a function:

```python
def by_count(artists):
    return artists[1]
  
artist_counts.sort(key=by_count, reverse=True)
```

In our previous section, we defined **by_count** and passed this through the **key** parameter. This took us about three lines of code. In python, there are two ways of writing functions. We learned the first way using **def** in the previous python course. Similar to how we reduced the size of our **for** loop into a list comprehension, we can also reduce the number of lines of a function using the **lambda** operator.

A lambda function is a small anonymous function:

```python
f = lambda x: x + 1
```

The equivalent using def:

```python
def f(x):
    return x + 1
```

Lambda functions have shortened notation (**lambda x** instead of **def f(x)**) and no function name associated. This makes lambda functions useful for short, throwaway functions that we don't plan on re-using later. This makes a lambda function the ideal choice for using with the **key** parameter when calling **dict.sort()**.



**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Sort **artist_counts_lol** by the number of top 100 appearances using a **lambda** function.
2. Select the top list in **artist_counts_lol** and assign to **lambda_top_artist**.



In [33]:
import csv
from collections import Counter
f = open("top100.csv","r")
music = list(csv.reader(f))

artists = [item[1] for item in music[1:]]
artists_counts_lol = [ [key, value] for key, value in Counter(artists).items()]

# put your code here
artists_counts_lol.sort(key=lambda x: x[1], reverse=True)
artists_counts_lol

[['Ed Sheeran', 5],
 ['The Chainsmokers', 3],
 ['Post Malone', 3],
 ['Drake', 3],
 ['Maroon 5', 3],
 ['Luis Fonsi', 2],
 ['The Weeknd', 2],
 ['Kendrick Lamar', 2],
 ['Bruno Mars', 2],
 ['ZAYN', 2],
 ['Calvin Harris', 2],
 ['Imagine Dragons', 2],
 ['Travis Scott', 2],
 ['DJ Khaled', 2],
 ['Martin Garrix', 2],
 ['Khalid', 2],
 ['Clean Bandit', 2],
 ['David Guetta', 1],
 ['Zion & Lennox', 1],
 ['Lauv', 1],
 ['Chris Jeday', 1],
 ['Zedd', 1],
 ['Childish Gambino', 1],
 ['Martin Jensen', 1],
 ['J Balvin', 1],
 ['Dua Lipa', 1],
 ['Taylor Swift', 1],
 ['Cardi B', 1],
 ['Julia Michaels', 1],
 ['Lil Pump', 1],
 ['Hailee Steinfeld', 1],
 ['Maluma', 1],
 ['KYLE', 1],
 ['NF', 1],
 ['Justin Bieber', 1],
 ['Rita Ora', 1],
 ['Katy Perry', 1],
 ['Alessia Cara', 1],
 ['Louis Tomlinson', 1],
 ['Miley Cyrus', 1],
 ['Portugal. The Man', 1],
 ['Danny Ocean', 1],
 ['Axwell /\\ Ingrosso', 1],
 ['CNCO', 1],
 ['Shawn Mendes', 1],
 ['Selena Gomez', 1],
 ['Future', 1],
 ['Lil Uzi Vert', 1],
 ['Logic', 1],
 ['Nial

## 3.10 Creating a Pipeline Using Modularization

So far, we've added to If we were to aggregate the code, this would look like spaghetti code:

```python
f = open(“top100.csv”, “r’)
music = list(csv.reader(f))

artists = [row[1] for row in music[1:]]

artist_dict = Counter(artists)
artist_counts = [[key, value] for key, value in artist_dict.items()]

artist_counts.sort(key = lambda x: x[1], reverse=True)
```

In the previous section, we learned about **modularization**. Now, let's take it a step further by modularizing our code into a **pipeline**. A pipeline takes in an input, performs a specific set of actions, then produces an output:

<img width="600" src="https://drive.google.com/uc?export=view&id=1G9XEPAUS5PijOoYT7XVKvhal9ky7rnec">


By transforming our spaghetti code into a pipeline, we can be confident that feeding the pipeline additional data will produce the desired result. Each pipeline component feeds data into the next component.

We're creating a pipeline that takes in a list object and returns the most dominant artist. Let's transform our spaghetti code into a pipeline!


**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">

1. Build a 3 function pipeline that re-creates the work we did in this mission.
2. Create a **read_data()** function that:
  - Accepts a filename string as its sole parameter.
  - Reads in the file into a list and returns the list representation.
3. Create a **clean_data()** function that:
  - Accepts the list representation of the data as it's sole parameter.
  - Uses multiple lines of code to convert this list into a list of lists (as we did earlier).
  - Returns the list of lists representation of the data.
4. Uncomment the commented code when you're ready to run the full pipeline!

In [0]:
# Add your functions here

# Uncomment when ready
# music_as_list = read_data("top100.csv")
# sorted_lol = clean_data(music_as_list)

## 3.11 How to deal with errors

Now that we've built a data pipeline, we can pass in unseen data through our pipeline. However, if our dataset contains one erratic row, the interpreter will halt execution of the function or pipeline and return an error.

In most cases, we should dive in and fix the code causing the error. However, if the problem doesn't occur frequently or there's a chance of an unexpected error occuring, we can use a **try/except** statement. A **try/except** statement is a conditional statement similar to **if-else**, that re-directs the execution of code if the code runs into a specific error.

Let's say we wanted to find the total number of streams in a list by adding every value in a list:


<img width="500" src="https://drive.google.com/uc?export=view&id=1rMhpcByvdTfOueowHPyECXUXZJPLFYrb">

We can't add the **"NULL"** since we're adding a **str** with an **int**. This returns an error that halts execution of our code. Instead of altering the values of streams, we can throw in a **try/except** statement:


<img width="500" src="https://drive.google.com/uc?export=view&id=1OS-SC5B7wo0hy0_7vVMSwjBhEjvU-VGr">


We'll see that the error message "Error Occured" and the total of 211.

In [0]:
f = open("top100.csv", "r")
music = list(csv.reader(f))

cleaned_list = []
for row in music[1:]:
  try:
    cleaned_list.append([row[0],row[1],float(row[-1])])
  except:
    "Pass"
print(cleaned_list)

[['Shape of You', 'Ed Sheeran', 2993988783.0], ['Despacito - Remix', 'Luis Fonsi', 1829621841.0], ['Despacito (Featuring Daddy Yankee)', 'Luis Fonsi', 1460802540.0], ['Something Just Like This', 'The Chainsmokers', 1386258295.0], ['HUMBLE.', 'Kendrick Lamar', 1311243745.0], ['Unforgettable', 'French Montana', 1289150890.0], ['rockstar', 'Post Malone', 1260181617.0], ["I'm the One", 'DJ Khaled', 1254196301.0], ["It Ain't Me (with Selena Gomez)", 'Kygo', 1190339348.0], ['XO TOUR Llif3', 'Lil Uzi Vert', 1171827725.0], ["That's What I Like", 'Bruno Mars', 1136379512.0], ['New Rules', 'Dua Lipa', 1119944498.0], ['I Don\xe2\x80\x99t Wanna Live Forever (Fifty Shades Darker) - From "Fifty Shades Darker (Original Motion Picture Soundtrack)"', 'ZAYN', 1115034686.0], ['Attention', 'Charlie Puth', 1112777364.0], ['Mi Gente', 'J Balvin', 1091656642.0], ['Congratulations', 'Post Malone', 1082624976.0], ['Thunder', 'Imagine Dragons', 1067732868.0], ['Havana', 'Camila Cabello', 1042161672.0], ['Stay (

## 3.12 Passing new data into our pipeline

Let's finish the pipeline we've built so far by adding one last function.

**Exercise**

<img width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


1. Create a **top_artist()** function that:
  - Accepts the list of tuples representation of the data.
  - Selects the first list and returns it (corresponding to the top artist).
2. Uncomment the code that calls the functions when you're ready.

In [0]:
def read_data(filename):
    f = open(filename,"r")
    music = list(csv.reader(f))
    return music

def clean_data(csv_list):
    artists = [row[1] for row in csv_list[1:]]
    artist_dict = Counter(artists)
    artist_counts_list= [[key,value] for key,value in artist_dict.items()]
    artist_counts_list.sort(key=lambda x: x[1], reverse=True)
    return artist_counts_list

# Add your function here

# Uncomment when ready
# music_as_list = read_data("top100.csv")
# sorted_lol = clean_data(music_as_list)
# most_popular_artist = top_artist(sorted_lol)