## 1. Introduction to Classes and Objects


In this mission, we are going to be introducing the concepts behind classes and objects. In your path to becoming proficient in Python, you have been introduced to classes and objects everywhere without knowing it. Python is known as an **object-oriented programming** language which means that everything in Python is an object! This means that integers, floats, strings, and anything else you can imagine is an object which is created from a class.

Anytime you assign an integer, string, or another type to a variable you have been creating an object from their respective classes.


>```python
count = 8
foo = "bar"
```

<img width="300" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0azRoY2hrOUt6b3M">

Think of a class as the blueprint used to construct objects with. These **blueprints** share similar functions (called **methods**) that we can use with any object. For example, when we create a list object, every list will have the <span style="background-color: #F9EBEA; color:##C0392B">append</span> method defined by the list class.

>```python
my_list = [1, 2, 3]
my_list.append(4)
# my_list object uses the append method which is defined by the list class.
```

A class bundles up logically grouped functions and variables (called **attributes**) that we can use anywhere in our code. The reason to use classes is similar to modules but instead of a requiring multiple files for different groupings, we can add multiple classes to a single file. This promotes code abstraction which helps us by not having to repeatedly write the same code over and over again.

Recall from the previous mission that we used the <span style="background-color: #F9EBEA; color:##C0392B">csv module</span> to parse a csv file. As you may remember, all we can do with those csv files is read from a file and then load it in a list. Unfortunately, within the <span style="background-color: #F9EBEA; color:##C0392B">csv module</span>, there are no helpful functions to do some actual analysis.

Additional functionality would have to be written on our own. One way is to write a function that can take in the csv data as a parameter and then run some analysis. But then where do we keep this function? What if we want to run multiple functions on the same dataset? Using a class, we can bundle up the csv data with common methods and share it across our code.

For this mission, we'll be working on creating a dataset object that can take in any csv file and expose methods to query it. The example data set we will be working with is a list of National Football League (NFL) games. Here are the first three rows:

| Year | Week | Winner              | Loser               |
|------|------|---------------------|---------------------|
| 2009 | 1    | Pittsburgh Steelers | Tennessee Titans    |
| 2009 | 1    | Minnesota Vikings   | Cleveland Browns    |
| 2009 | 1    | New York Giants     | Washington Redskins |

Each row in our data set represents a game. Here's a description of the columns:

- <span style="background-color: #F9EBEA; color:##C0392B">year</span> the game took place
- <span style="background-color: #F9EBEA; color:##C0392B">week</span> of the season (out of 17 total weeks)
- <span style="background-color: #F9EBEA; color:##C0392B">winner</span> the winning team
- <span style="background-color: #F9EBEA; color:##C0392B">loser</span> the losing team

## 2. Defining the dataset class

To create a class in Python, we use the keyword <span style="background-color: #F9EBEA; color:##C0392B">class</span> followed by the desired class name. The class name, by convention, is in <span style="background-color: #F9EBEA; color:##C0392B">PascalCase</span> where the first letter of every word is capitalized. Here is how you would define the dataset class:

>```python
class Dataset:
    def __init__(self):
        self.type = "csv"
```

Let's break this down piece by piece. Within the declaration of the <span style="background-color: #F9EBEA; color:##C0392B">class</span> we have written what looks like the function syntax for the method <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span>. Remember from before that internal functions of a class are called methods. Before we explain what <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span> does, let's see the syntax on how to create a dataset object.

>```python
new_dataset = Dataset()
```

This <span style="background-color: #F9EBEA; color:##C0392B">new_dataset</span> variable refers to an instance of the Dataset class. When creating this object, the Python interpreter uses the special <span style="background-color: #F9EBEA; color:##C0392B">__\_init\_\_()</span> method we defined to instantiate the object. This creates the new object and then sets those attributes to the instance.

<img width="400" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0TTFJWXVmZXNrS2M">


To access the attributes of a class we use dot notation like the following:

>```python
print(new_dataset.type)   # prints out "csv"
```


Let's take a look back at where we defined the <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span> method. You can see that we had to pass in an argument to the <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span> method, self.

Python uses this <span style="background-color: #F9EBEA; color:##C0392B">self</span> variable to refer to the created object so you can interact with the instance data. If you didn't have <span style="background-color: #F9EBEA; color:##C0392B">self</span>, then the class wouldn't know where to store the internal data you wanted to keep. By convention, <span style="background-color: #F9EBEA; color:##C0392B">self</span> is used to define the instance even though it's possible to name it whatever you want. It is highly recommended to use <span style="background-color: #F9EBEA; color:##C0392B">self</span> because any project that is built using Python will also give it that name.

When creating an object, you never have to worry about passing in a <span style="background-color: #F9EBEA; color:##C0392B">self</span> object on instantiation since this is done automatically by the Python interpreter. If this were not the case then it would look like something like the following:

>```python
new_dataset = Dataset(new_dataset)
```

This looks odd and it would be tedious to do every time you create an object so thankfully Python will do it for you.

<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

**Description**: 

>1. Create a **class** called **Dataset**.
2. Inside the **class**, create a **type attribute**. Assign the value "csv" to it.
3. Create an instance of the Dataset class, and assign it to the variable **dataset**.
4. Print the **type** attribute of the **dataset** instance.

## 3. Passing Additional Arguments to the Initializer

Before, we saw that we can initialize objects with attributes that we explicitly set on the <span style="background-color: #F9EBEA; color:##C0392B">self</span> variable. Setting these attributes means that the values will be shared across every created object. The problem with this is that we might want to be able to set a unique attribute for the new object (like rows from a csv file).

To dynamically add data to our dataset object on instantiation, we have to add an additional argument to the <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span> method (remembering to keep <span style="background-color: #F9EBEA; color:##C0392B">self</span> first!).

>```python
class Dataset:
    def __init__(self, data):
        self.data = data
```

Then, to create an object with the data, we pass it in when instantiating the class. A reminder from last mission is that we can use the csv module to read a data set from a **csv** file:

>```python
f = open("somefile.csv", 'r')
csvreader = csv.reader(f)
csv_data = list(csvreader)
csv_dataset = Dataset(csv_data)
```

Recall that <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span> is a method is just another way of saying a function defined in a class. Therefore, we can treat it as any function you've seen before. As a result, we can pass any amount of arguments when instantiating an object. You can also access the new attribute you set by using the dot notation described previously.

>```python
# prints the first 10 rows of the csv data.
print(csv_dataset.data[:10])
# prints the first 10 rows of the csv data.
print(csv_dataset.data[:10])
```

<br>
<div class="alert alert-info">
<b>Guided Exercise.</b>
</div>

**Description**: 

>1. Add a **data** parameter to the **\_\_init\_\_()** method, and set the value to the **self.data** attribute.
2. Read the data from **nfl.csv** and set it to the variable **nfl_data**.
3. Make an instance of the class, passing in **nfl_data** to the **\_\_init\_\_()** method (when you call **Dataset(...)**).
    - Assign the result to the variable **nfl_dataset**.
4. Use the data attribute to access the underlying data for **nfl_dataset** and assign the result to the variable **dataset_data**.

>```python
class Dataset:
    def __init__(self, data):
        self.data = data
f = open("nfl.csv", 'r')
```

>```python
csvreader = csv.reader(f)
nfl_data = list(csvreader)
```
>```python
nfl_dataset = Dataset(nfl_data)
dataset_data = nfl_dataset.data
```

## 4. Adding Additional Behavior

Instantiating our objects with attributes is great but we can do even more by having additional instance methods. From a previous example in the code, we printed the first 10 rows of the data by calling the <span style="background-color: #F9EBEA; color:##C0392B">print()</span> function outside of the class. Let's make a new method that will always print out the first 10 rows.

>```python
class Dataset:
    def __init__(self, data):
        self.data = data
```
>```python
    def print_data(self):
        # New method **remember to add self**.
        print(self.data[:10])
```

Like <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span>, we need to add <span style="background-color: #F9EBEA; color:##C0392B">self</span> to the first parameter of this instance method. As you can see from this example, the benefit of having the <span style="background-color: #F9EBEA; color:##C0392B">self</span> variable is that we can reference it from any instance method we define. Also, using <span style="background-color: #F9EBEA; color:##C0392B">self</span>, you can call the <span style="background-color: #F9EBEA; color:##C0392B">print_data()</span> method within other instance methods by calling <span style="background-color: #F9EBEA; color:##C0392B">self.print_data()</span>.

Let's create an instance of the dataset again and print the first 10 rows.

>```python
nfl_dataset = Dataset(nfl_data)
nfl_dataset.print_data()  # Prints the first 10 rows.
```

To practice what we've learned, let's create a method on the dataset class that can print any amount of rows.

<br>
<div class="alert alert-info">
<b>Guided Exercise.</b>
</div>

**Description**: 

>1. Add an instance method **print_data()** that takes in a **num_rows** argument.
    - This method should print out data up to the given amount of rows.
2. Create an instance of the **Dataset** class and initialize with the **nfl_data**. **nfl_data** is already loaded for you.
    - Assign it to the variable **nfl_dataset**.
3. Call the **print_data** method, setting the **num_rows** parameter to **5**.

>```python
class Dataset:
    def __init__(self, data):
        self.data = data
```
>```python
    def print_data(self, num_rows):
        print(self.data[:num_rows])
```
>```python
nfl_dataset = Dataset(nfl_data)
nfl_dataset.print_data(5)
```

## 5. Enhancing the Initializer

You may have noticed when printing the data that the first element in the list of rows contains some header information. Using the <span style="background-color: #F9EBEA; color:##C0392B">csv</span> module, we don't have a way of extracting this header unless we grab the first element. However, with our dataset class, we could add an instance method that would grab the first result of <span style="background-color: #F9EBEA; color:##C0392B">self.data</span>, set it as a <span style="background-color: #F9EBEA; color:##C0392B">header</span> attribute, and then remove it from the <span style="background-color: #F9EBEA; color:##C0392B">data</span> attribute. Let's try that now:

>```python
...
def extract_header(self):
    self.header = self.data[0]
    self.data = self.data[1:]  # set data 
...
```

This works well but there is a problem. Let's say the user keeps calling the <span style="background-color: #F9EBEA; color:##C0392B">extract_header()</span> method continuously. Well, then the second time this is called the correct header will be overwritten and the next row will be set as the header. What we want is to extract the header once and only once in our class.

The best place to do that is to set it in the initializer! Because this method will only get called once on instantiation we know that the header will also only be set once. By setting the header within the initializer, the user doesn't have to worry about calling the method after creating the object and this promotes a better user experience.

Let's add the header extraction to the initializer now.

<br>
<div class="alert alert-info">
<b>Guided Exercise.</b>
</div>

**Description**: 

>1. Add the **extract_header()** code to the initializer and set the header data to **self.header**.
2. Create a variable called **nfl_header** and set it to the header attribute

>```python
class Dataset:
    def __init__(self, data):
        self.header = data[0]
        self.data = data[1:]
```
>```python
nfl_dataset = Dataset(nfl_data)
nfl_header = nfl_dataset.header
```


## 6. Grabbing Column Data

In the previous screen we were able to parse the headers from a csv file. With these headers, a helpful function for analyzing a dataset is to grab all the column data for a given header label. This is helpful since you might want to extract data from a specific column of a dataset and then process it.

Looking at <span style="background-color: #F9EBEA; color:##C0392B">nfl_data</span>, you may notice that the header's index lines up with the rest of the rows. To grab the column data, all we need to do is search through the headers, find the index of the given label, and then loop through the rest of the data returning the value of the index every iteration.

A great function to help us search the header and extract both the index and label to check is called <span style="background-color: #F9EBEA; color:##C0392B">enumerate()</span>. Here's an example on how it works:

>```python
for idx, value in enumerate(['foo', 'bar']):
    print(idx, value)
```

This will print **0** foo and **1** bar which represents the index and value, respectively.


<br>
<div class="alert alert-info">
<b>Guided Exercise.</b>
</div>

**Description**: 

>1. Add a method named **column** that takes in a **label** argument, finds the index of the header, and returns a list of the column data.
    - If the **label** is not in the header, you should return **None**.
3. Create a variable called **year_column** and set it to the return value of **column('year'**).
4. Create a variable called **player_column** and set it to the return value of **column('player'**).

>```python
class Dataset:
    def __init__(self, data):
        self.header = data[0]
        self.data = data[1:]
```
>```python
    def column(self, label):
        if label not in self.header:
            return None
```
>```python
        index = 0
        for idx, element in enumerate(self.header):
            if label == element:
                index = idx
```
>```python
        column = []
        for row in self.data:
            column.append(row[index])
        return column
```
>```python
nfl_dataset = Dataset(nfl_data)
year_column = nfl_dataset.column('year')
player_column = nfl_dataset.column('player')
```

## 7. Count Unique Method

Let's add a <span style="background-color: #F9EBEA; color:##C0392B">count_unique()</span> method to our class so that a user can choose a label and then get the total amount of unique results in the column. This is not too tricky since we have already done all the hard lifting in the <span style="background-color: #F9EBEA; color:##C0392B">column</span> method but it returns all the elements in a column. Keep this in mind when writing the <span style="background-color: #F9EBEA; color:##C0392B">count_unique()</span> method.

Recall that we can use the <span style="background-color: #F9EBEA; color:##C0392B">self</span> parameter to access instance methods in the same way that we accessed an attribute. Here's an example of how you can access a method within the declaration of another instance method.

>```python
...
def other_instance_method(self):
    results = self.get_results()
    ...
...
```

<br>
<div class="alert alert-info">
<b>Guided Exercise.</b>
</div>

**Description**: 

> 1. Add a method to the **Dataset** class called **count_unique()** that takes in a **label** arguments.
2. Get the unique set of items from the **column()** method and return the total count.
3. Use the instance method to assign the number of unique term values of **year** to **total_years**.

>```python
class Dataset:
    def __init__(self, data):
        self.header = data[0]
        self.data = data[1:]
```
>```python
    def column(self, label):
        if label not in self.header:
            return None
```
>```python
        index = 0
        for idx, element in enumerate(self.header):
            if label == element:
                index = idx
```
>```python
        column = []
        for row in self.data:
            column.append(row[index])
        return column
```
>```python
    def count_unique(self, label):
        count = 0
        for item in set(self.column(label)):
            count += 1
        return count
```
>```python
nfl_dataset = Dataset(nfl_data)
total_years = nfl_dataset.count_unique('year')
```

## 8. Make Objects Human Readable

With all the features of the Dataset class, we want to make it easier for the user to take a snapshot look at the data. Multiple screens ago, we wrote a print_data method that took in a number of rows and then printed out those rows. Now, we are going to write another method that will do something similar but using a python special function instead.

Across all python classes, there are a list of [special methods](https://docs.python.org/3/reference/datamodel.html#basic-customization) that we can implement. Each one of provides additional customization that the python interpreter will use to enhance your object. When we implemented <span style="background-color: #F9EBEA; color:##C0392B">\_\_init\_\_()</span>, it told the python interpreter that anything within that method is what we want to initialize when we create our object.

One special method is <span style="background-color: #F9EBEA; color:##C0392B">\_\_str\_\_()</span> which tells the python interpreter how to represent your object as a string. Whenever we try to convert the object into a string or when we want to print out that object, we can use <span style="background-color: #F9EBEA; color:##C0392B">\_\_str\_\_()</span> method to customize the way it looks when we display the object using the <span style="background-color: #F9EBEA; color:##C0392B">print()</span> function. Use the console and check what the **nfl_dataset** object looks like by doing the following:

>```python
>> nfl_dataset = Dataset(nfl_data)
>> nfl_dataset
<__main__.Dataset instance at 0x10abc23b0>
```

This doesn't really tell the user what the dataset looks like internally so let's fix that.


<br>
<div class="alert alert-info">
<b>Guided Exercise.</b>
</div>

**Description**: 

> 1. Add a method to the **Dataset** class called <span style="background-color: #F9EBEA; color:##C0392B">\_\_str\_\_()</span>
     - Convert the first 10 rows of **self.data** to a string and set it as the return value.
2. Create an instance of the class called **nfl_dataset** and call print on it.

>```python
class Dataset:
    def __init__(self, data):
        self.header = data[0]
        self.data = data[1:]
```
>```python
    def __str__(self):
        data_string = self.data[:10]
        return str(data_string)
```
>```python
    def column(self, label):
        if label not in self.header:
            return None
```
>```python
        index = 0
        for idx, element in enumerate(self.header):
            if label == element:
                index = idx
```
        column = []
        for row in self.data:
            column.append(row[index])
        return column
>```python   
    def count_unique(self, label):
        count = 0
        for item in set(self.column(label)):
            count += 1
        return count
```
>```python
nfl_dataset = Dataset(nfl_data)
print(nfl_dataset)
```