# Tutorial 1 Python Data Types

## A substantial part of the challenge in programming for data analysis is organizing the data in useful ways that are easy to keep track and avoid errors
<br>

## In this tutorial, I introduce native data types in python - lists, tuples, and dictionaries.  
## We learn how to move between these data types and numpy arrays.
## We learn how to change integers to floating point numbers and floating point numbers to integers 
<br>

##
## Please read through the tutorial, and execute each box of code, and make sure you understand the outputs. 
## Its very helpful to examine the contents of the variables, using the Jupyter:Variables table, which you can find using View -> Terminal
## Feel free to add print commands to print output to the notebook.   
## Feel free to add your own notes
<br>

# To confirm that you have completed the tutorial, please upload this file with all the code blocks executed.  

## 1.1 Native Python Data Types 
<br>

### In this class we introduced variable types
<br> 

* integers - `int`,`int16`,`int64`
###
* floating point numbers - `float`,`float64`
###
* string, i.e., text - `string`
<br>
<br>

### Then we considered the numpy **array** which is a data type evoked by loading the numpy module.
<br>

### Native python data types
<br>

### There are 3 native python data types.  By native, we mean part of the original python and not coming from numpy.  
<br>

* list - exactly what it says, a list 
###
* tuple - a special kind of list which is immutable (values cannot be changed)
###
* dict - a structure known as a dictionary that is invaluable for organizing data including numpy arrays

## We are going to follow a good coding practice of importing the modules we need for a lesson at the start of a lesson.  

In [None]:
import numpy as np

## 1.2 Lists
<br>


### A list is a collection of values in a single variable. 

In [None]:
my_list = [1,9,6,7] # I made a list here with 4 numeric entries.  Note the use of square brackets [] and commas , to separate the elements of the list. 
print(my_list)

## Examine the list in your Jupyter:Variables table. 

## The list is of size 4, reflecting the number of elements, and of type list. 

## The list is a flexible data type designed as a **container** for holding items together

## But its not a particularly useful data type for mathematical operations.  

In [None]:
list2 = list + 2

## So that went badly, because a `list` doesnt support mathematical operations.  

## The strength of lists is the flexibility to hold together different data types that are associated with each other.  

In [None]:

#I can mix together all the data types I know about in a list 

my_crazy_list = ['Kawhi','Leonard','June',29,1991,79.0]

### I created a list all associated with a NBA player, including 
<br> 

###
* *strings* first name, last name, and date of birth, 
###
* *integers* month and year of birth 
###
* *floating point* height

## A list can contain many different data types, unlike a *numpy* **array** which can only contain one data type. 

## 1.3 List methods
<br>

## The flexibility of lists is their strength.  It is also their weakness as the flexibility also gives greater scope for making mistakes in data handling. 
<br>

## Here I show how we **append** elements to a list 

### The syntax here is different from our usual syntax 

### `list_name.append(variable)`

In [None]:
my_kawhi_list = ['Kawhi','Leonard','June',29,1991,79.0]
points = 26.0
rebounds = 6.6
assists = 5.0
steals = 1.7
blocks = 0.4
#lets add these numbers to the list 
my_kawhi_list.append(points)
my_kawhi_list.append(rebounds)
my_kawhi_list.append(assists)
my_kawhi_list.append(steals)
my_kawhi_list.append(blocks)
#
print(my_kawhi_list)

### Take a look at the list in the JUPYTER:VARIABLES panel. 
<br>

### To the left of the list you should see a square symbol with an arrow.  Click on it.  

### It should open a data viewer window which shows the content of the list. 

### This also works with *numpy* arrays.  

### This is a useful tool to look at all the values of a list or array

## Here I show how we **remove** an element from a list 

### `list_name.remove(variable)`

In [None]:
my_kawhi_list.remove(1.7)
print(my_kawhi_list)

### The item 1.7 (steals) is removed.   Notice i did not use the original variable name *steals*.  
### That variable name is not associated with the list. Only the value is in the list.  

## 1.3 Tuples <br>

###  Tuples are a special kind of list that has the property of not being changable (immutable)
###
###  Many python code arguments are looking for tuples rather than lists. 
###
### From a syntax point of view the difference between a tuple and a list is the use of parenthesis () rather than square brackets[]
###
### We've actually used a tuple before in defining numpy arrays.  We will make use of them, and encounter them in numpy outputs.

In [None]:
array_size = (2,3)  #This is a tuple, defined by () rather than []
#pass the tuple as a variable into zeros
my_array = np.zeros(array_size);
#pass a tuple defined inside zeros
my_other_array = np.zeros((3,4))
#pass a tuple to int. 
#numpy returns things that should be immutable using tuples 
my_array_shape = np.shape(my_other_array)
print(my_array_shape)

## 1.4 Indexing into lists  

### First lets remake the list of Kawhi Leonard stats

In [None]:
my_kawhi_list = ['Kawhi','Leonard','June',29,1991,79.0]
points = 26.0
rebounds = 6.6
assists = 5.0
steals = 1.7
blocks = 0.4
#lets add these numbers to the list 
my_kawhi_list.append(points)
my_kawhi_list.append(rebounds)
my_kawhi_list.append(assists)
my_kawhi_list.append(steals)
my_kawhi_list.append(blocks)
#

### Now suppose i wanted to get to his points per game for some data analysis (26.6). 
### I want to figure out a way to do it so that for every player whose data is organized in the same order, 
### I can extract the points per game from the list. <br>
### We can see that points is the 7th entry in the list.  


In [None]:
ppg = my_kawhi_list[7]
print(ppg)

### Thats didnt work.  In fact, it returned the 8th element of the list, which is rebounds.  <br> 
## SADLY, computers count from zero, while human beings count from 1.  
### If we start the count with 'Kawhi' as item 0, we realize the **index** for 26.6 is 6 and not 7 

In [None]:
ppg = my_kawhi_list[6]
print(ppg)

### Take a look at the JUPYTER:VARIABLES pane.  
### Notice that the variable ppg is of type float.  
### Inside a list, a variable cannot have a type, but once I remove it from the list it has a type.  

### I can also index to a range of values.  
### suppose I want to recover Kawhi's points, rebounds, assists - the classic box score stats.  

### Since points is item 6 counting from 0, item 7 is rebounds and item 8 is assists (see above to verify)

In [None]:
boxscore = my_kawhi_list[6:8]
print(boxscore)

### That didnt work either.  
### Since we need the next item, lets add and go from 6:9 

In [None]:
boxscore = my_kawhi_list[6:9]
print(boxscore)

## Indexing into list and arrays in python is always **inclusive** of the first element and **exclusive** of the last element.  

## We will discuss indexing methods in depth using arrays next week

## 1.5 Merging Lists and Strings 

## We can merge two lists and strings, with a simple +
## Lists and strings do not do math addition, instead + means concatenation. 

In [None]:
my_kawhi_list = ['Kawhi','Leonard']
points = 26.0
rebounds = 6.6
assists = 5.0
steals = 1.7
blocks = 0.4
kawhi_season_stats = [points,rebounds,assists,steals,blocks]
merged_kawhi_list = my_kawhi_list + kawhi_season_stats   # + for lists does not do ADDITION. 
print(merged_kawhi_list)

## Now lets merge text strings  

In [None]:
Full_Name = merged_kawhi_list[0]+merged_kawhi_list[1] # Again, note that the first entry in the list is index 0
print(Full_Name)

### Let's make it pretty by adding a space between first and last name,  

In [None]:
Pretty_Name = merged_kawhi_list[0]+' ' + merged_kawhi_list[1]
print(Pretty_Name)


## 1.6 Changing Data Types 

## Being able to change data types is essential for managing data in python.  

## A powerful command to change lists into arrays and to change the data type of arrays is the `array` command from `numpy`. 

### There are two useful things we will do with the `array` command.  First, we will convert a numeric list into a numpy array.  

### We will learn how to control the data type of the numpy using the `dtype` argument

In [None]:
my_integer_list = [6,7,8]
my_int_array = np.array(my_integer_list)  # here I use the array call using its default behavior to convert the list into an array
print(my_int_array)
my_int_array.dtype  # I print out the data type and its 64 bit integers.  



In [None]:
# now lets use some floats 
my_float_list = [6.0,7.0,8.0]
my_float_array = np.array(my_float_list)
print(my_float_array)
my_float_array.dtype  # I print out the data type and its 64 bit floats

In [None]:
#what if we have a mixed list 

my_mixed_list = [6,7.7,9.2,10]

my_mixed_array = np.array(my_mixed_list)
print(my_mixed_array)
my_mixed_array.dtype

### When confronted with a mixed list, python defaults to floating point numbers because it can accomodate both data types.  

In [None]:
#Lets convert both of these lists to integers and see what happens. 

my_float_2_int_array = np.array(my_float_list,dtype = 'int')
print(my_float_2_int_array)

In [None]:
my_mixed_2_int_array = np.array(my_mixed_array, dtype='int') # notice I passed an array into array here.  works the same as a list. 
print(my_mixed_2_int_array)

### Notice that the default behavior of `array` is to **crop** any decimal values and not round.  

