<a id="syntax"></a>

## Python Programming Language

Why Python? 
    1. It’s easy to learn 
        • Now the language of choice for 8 of 10 top US computer science programs (Philip Guo, CACM) 
    2. Full featured 
        • Not just a statistics language, but has full capabilities for data acquisition, cleaning, databases, high performance  computing, and more 
    3. Strong Data Science Libraries 
        • The SciPyEcosystem

1. Function
2. Data types
3. Loops and control structures
4. Reading and Writing CSV files

### Syntax

In [1]:
2 + 2

4

In [2]:
x = 5
y = 2
x * y

10

In [3]:
x ** y

25

In [4]:
print('Hello, world!')

Hello, world!


In [5]:
print('%s raised to power of %s equals %s' % (x, y, x ** y))

5 raised to power of 2 equals 25


<a id="functions"></a>

### Functions

Functions allow you to carry out the same task multiple times. This reduces the amount of code you write, reduces mistakes, and makes your code easier to read.

In [6]:
def say_hello():
    print('Hello, world!')

In [7]:
say_hello()

Hello, world!


In [8]:
def print_a_string(foo):
    print('%s' % foo)

In [9]:
print_a_string('Here is a string.')

Here is a string.


<br>
`add_numbers` is a function that takes two numbers and adds them together.

In [10]:
def add_numbers(x,y):
    return x + y

add_numbers(2.5,3)

5.5

<br>
`add_numbers` updated to take an optional 3rd parameter. Using `print` allows printing of multiple expressions within a single cell.

In [11]:
def add_numbers(x,y,z = None):
    if(z == None):
        return x+y
    else: 
        return x+y+z
    
print(add_numbers(12,3))
print(add_numbers(12,3,4))

15
19


<br>
`add_numbers` updated to take an optional flag parameter.

In [12]:
def add_numbers(x, y, z=None, flag=False):
    if (flag):
        print('Flag is true!')
    if (z==None):
        return x + y
    else:
        return x + y + z
    
print(add_numbers(1, 2, False))

3


<br>
Assign function `add_numbers` to variable `a`.

In [13]:
def add_numbers(x,y):
    return x+y

a = add_numbers
a(1,2)

3

<a id="types"></a>

### Data Types

#### Booleans

'True' and 'False' have special meaning in Python.

In [14]:
a = True
b = False

In [15]:
a == True

True

In [16]:
b == True

False

In [17]:
a or b

True

In [18]:
a and b

False

#### Numbers: integers and floats

Numbers are pretty straightforward, especially in Python 3.

In [19]:
1 + 2

3

In [20]:
1.0 + 2.0

3.0

In [21]:
1 / 2

0.5

In [22]:
1.0 / 2.0

0.5

In [23]:
type(1)

int

In [24]:
type(1/2)

float

#### Strings

The next four data types -- strings, lists, tuples, arrays -- are all sequences.

Strings are sequences of characters.

In [25]:
s = 'Hello, world'

In [26]:
type(s)

str

In [27]:
s[0:4]

'Hell'

In [28]:
s + '!'

'Hello, world!'

In [29]:
s

'Hello, world'

In [30]:
s = s + '!'

In [31]:
s

'Hello, world!'

#### Lists

Lists are _mutable_ sequences of anything.

In [32]:
l = [0, 1, 1, 2, 3, 5, 8]

In [33]:
m = [5, 2, 'a', 'xxx', True, [0, 1]]

In [34]:
l[0:3]

[0, 1, 1]

In [35]:
m[4]

True

In [36]:
m[4] = False

In [37]:
m[4:]

[False, [0, 1]]

#### Tuples

Tuples are immutable sequences of anything (similar to lists except you can't change them).

In [38]:
n = (3, 5, 6)

In [39]:
n[0]

3

In [42]:
#n[0] = 2

#### Arrays (numpy)

In [43]:
# Import modules to use
import math
import numpy as np

In [44]:
mylist = [0, 2, 4]
np.array(mylist)

array([0, 2, 4])

In [45]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [46]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [47]:
np.arange(4, 10)

array([4, 5, 6, 7, 8, 9])

In [48]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

In [49]:
np.linspace(0, 10, 5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [50]:
np.linspace(0, 10, 11)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [51]:
np.random.rand()

0.4573153460368742

In [52]:
np.random.rand(5)

array([0.76097913, 0.59076078, 0.80987156, 0.69132595, 0.54482689])

#### Sets

Sets are unordered collections of unique objects.

In [53]:
s1 = {'a', 'b', 'c'}
s2 = {'a', 'd', 'e'}

In [54]:
s1 & s2

{'a'}

In [55]:
s1 | s2

{'a', 'b', 'c', 'd', 'e'}

In [56]:
s3 = set(l)
s4 = set(m[0:2])

In [57]:
s3 & s4

{2, 5}

In [58]:
s3 | s4

{0, 1, 2, 3, 5, 8}

In [59]:
s3 - s4

{0, 1, 3, 8}

#### Dictionaries

Dictionaries or 'dicts' are hash tables, where a key points to a value.

In [60]:
d = {'name': 'John Doe', 'age': 27, 'dob': '7/20/1989'}

In [61]:
d

{'name': 'John Doe', 'age': 27, 'dob': '7/20/1989'}

In [62]:
d['name']

'John Doe'

In [63]:
d['zip'] = 92039

<a id="control"></a>

### Loops and Control Structures

#### Boolean and comparison operations

In [64]:
x = 5
(x < 6) and (x > 4)

True

In [65]:
x != 4

True

In [66]:
5 in [3, 4, 5]

True

In [67]:
'ell' in 'Hello'

True

In [68]:
len('Hello') >= 5

True

#### if tests

In [69]:
if 'd' in 'abc':
    print('Learn your alphabet.')
elif (2 + 2 == 5):
    print('Sometimes yes.')
else: 
    print('Nothing is true.')    

Nothing is true.


#### while loops

In [70]:
i = 0
while (i < 5):
    print(i)
    i += 1

0
1
2
3
4


In [71]:
i

5

#### for loops

In [72]:
for x in [0, 1, 2, 3, 4]:
    print(x**2)

0
1
4
9
16


#### Lambda and List Comprehensions
<br>
A lambda function is a small anonymous function.
A lambda function can take any number of arguments, but can only have one expression.

Here's an example of lambda that takes in three parameters and adds the first two.

In [None]:
my_function = lambda a, b, c : a + b

In [None]:
my_function(1, 2, 3)

<br>
Let's iterate from 0 to 999 and return the even numbers.

In [75]:
my_list = []
for number in range(0, 1000):
    if number % 2 == 0:
        my_list.append(number)
my_list[:-1]

499

<br>
Now the same thing but with list comprehension.

In [76]:
my_list = [number for number in range(0,1000) if number % 2 == 0]
my_list

500

#### Reading and Writing CSV files

<br>
Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year

<br>
1. Find the average cty fuel economy across all cars
<br>
2. Grouping the cars by number of cylinder, and find the average cty mpg for each group
<br>
3. Find the average hwy mpg for each class of vehicle

In [78]:
import csv

%precision 2 
input_file_path = r"C:\Users\Asus\Documents\GitHub\DA-in-Python\data\mpg.csv"

with open(input_file_path) as csvfile:
    mpg = list(csv.DictReader(csvfile))
    
type(mpg[:])

list

<br>
`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list is comprised of 234 dictionaries.

In [79]:
len(mpg)

234

<br>
`keys` gives us the column names of our csv.

In [80]:
mpg[0].keys()

odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

<br>
This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float.

In [81]:
sum(float(d['cty']) for d in mpg)/len(mpg)

16.86

<br>
Similarly this is how to find the average hwy fuel economy across all cars.

In [82]:
sum(float(d['hwy']) for d in mpg) / len(mpg)

23.44

<br>
Use `set` to return the unique values for the number of cylinders the cars in our dataset have.

In [83]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

{'4', '5', '6', '8'}

<br>
Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [87]:
CtyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
    summpg = 0
    cyltypecount = 0
    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count
    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')

CtyMpgByCyl.sort(key=lambda x1: x1[0])
CtyMpgByCyl

[('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

<br>
Use `set` to return the unique values for the class types in our dataset.

In [85]:
vehicleclass = set(d['class'] for d in mpg) # what are the class types
vehicleclass

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

<br>
And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.

In [86]:
HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
    summpg = 0
    vclasscount = 0
    for d in mpg: # iterate over all dictionaries
        if d['class'] == t: # if the cylinder amount type matches,
            summpg += float(d['hwy']) # add the hwy mpg
            vclasscount += 1 # increment the count
    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]