<a id="syntax"></a>

## Python Programming Language

Why Python? 
    1. It’s easy to learn 
        • Now the language of choice for 8 of 10 top US computer science programs (Philip Guo, CACM) 
    2. Full featured 
        • Not just a statistics language, but has full capabilities for data acquisition, cleaning, databases, high performance  computing, and more 
    3. Strong Data Science Libraries 
        • The SciPyEcosystem

1. Function
2. Data types
3. Loops and control structures
4. Reading and Writing CSV files

### Variables

In [131]:
2 + 2 #click run, Ctrl + Enter to RUN code 

4

In [132]:
x = 5
y = 2
x * y

10

In [133]:
x ** y

25

In [134]:
print("Hello, world!")

Hello, world!


In [135]:
print('%s raised to power of %s equals %s' % (x, y, x ** y))

5 raised to power of 2 equals 25


<a id="functions"></a>

### Functions

Functions allow you to carry out the same task multiple times. This reduces the amount of code you write, reduces mistakes, and makes your code easier to read.

In [136]:
def say_hello():
    print('Hello, world!')

In [137]:
say_hello()

Hello, world!


In [138]:
def print_a_string(foo):
    print('%s' % foo)

In [139]:
print_a_string('Here is a string.')

Here is a string.


<br>
`add_numbers` is a function that takes two numbers and adds them together.

In [140]:
def add_numbers(x,y):
    return x + y

add_numbers(2.5,3)

5.50

<br>
`add_numbers` updated to take an optional 3rd parameter. Using `print` allows printing of multiple expressions within a single cell.

In [141]:
def add_numbers(x,y,z = None):
    if(z == None):
        return x+y
    else: 
        return x+y+z
    
print(add_numbers(12,3))
print(add_numbers(12,3,4))

15
19


<br>
`add_numbers` updated to take an optional flag parameter.

In [142]:
def add_numbers(x, y, z=None, flag=False):
    if (flag):
        print('Flag is true!')
    if (z==None):
        return x + y
    else:
        return x + y + z
    
print(add_numbers(1, 2, False))

3


<br>
Assign function `add_numbers` to variable `a`.

In [143]:
def add_numbers(x,y):
    return x+y

a = add_numbers
a(1,2)

3

<a id="types"></a>

### Data Types

#### Booleans

'True' and 'False' have special meaning in Python.

In [144]:
a = True
b = False

In [145]:
a == True

True

In [146]:
b == True

False

In [147]:
a or b

True

In [148]:
a and b

False

#### Numbers: integers and floats

Numbers are pretty straightforward, especially in Python 3.

In [149]:
1 + 2

3

In [150]:
1.0 + 2.0

3.00

In [151]:
1 / 2

0.50

In [152]:
1.0 / 2.0

0.50

In [153]:
type(1), type(1.0)

(int, float)

In [154]:
type(1/2)

float

#### Strings

The next four data types -- strings, lists, tuples, arrays -- are all sequences.

Strings are sequences of characters.

In [155]:
s = 'Hello, world'

In [156]:
type(s)

str

In [157]:
s[0:4], s[4:7]

('Hell', 'o, ')

In [158]:
s + '!'

'Hello, world!'

In [159]:
s

'Hello, world'

In [160]:
s = s + '!'

In [161]:
s

'Hello, world!'

In [162]:
s = s + str(2)
s

'Hello, world!2'

#### Lists

Lists are _mutable_ sequences of anything.

In [163]:
l = [0, 1, 1, 2, 3, 5, 8]

In [164]:
m = [5, 2, 'a', 'xxx', True, [0, 1]]

In [165]:
l[0:3]

[0, 1, 1]

In [166]:
m[4] # 4 = index 5th

True

In [167]:
m[4] = False

In [168]:
m[4:] # from ind 5th 

[False, [0, 1]]

#### Tuples

Tuples are immutable sequences of anything (similar to lists except you can't change them).

In [169]:
n = (3, 5, 6)

In [170]:
n[0]

3

In [171]:
n[0] = 2

TypeError: 'tuple' object does not support item assignment

#### Arrays (numpy)

In [None]:
# Import modules to use
import math
import numpy as np

#?np.linspace

In [None]:
mylist = [0, 2, 4]
np.array(mylist)

In [None]:
np.zeros(5)

In [None]:
np.arange(5)

In [None]:
np.arange(4, 10)

In [None]:
np.arange(0, 10, 2)

In [None]:
? np.arange

In [None]:
np.random.rand()

In [None]:
np.random.rand(5)

#### Sets

Sets are unordered collections of unique objects.

In [None]:
s1 = {'a', 'b', 'c'}
s2 = {'a', 'd', 'e'}

In [None]:
s1 & s2

In [None]:
s1 | s2

In [None]:
s3 = set(l)
s4 = set(m[0:2])

In [None]:
s3, s4

In [None]:
s3 & s4

In [None]:
s3 | s4

In [172]:
s3 - s4

{0, 1, 3, 8}

#### Dictionaries

Dictionaries or 'dicts' are hash tables, where a key points to a value.

In [173]:
d = {'name': 'John Doe', 'age': 27, 'dob': '7/20/1989'}

In [174]:
d, type(d)

({'name': 'John Doe', 'age': 27, 'dob': '7/20/1989'}, dict)

In [175]:
d['name']

'John Doe'

In [176]:
d['zip'] = 92039
d

{'name': 'John Doe', 'age': 27, 'dob': '7/20/1989', 'zip': 92039}

<a id="control"></a>

### Loops and Control Structures

#### Boolean and comparison operations

In [177]:
x = 5
(x < 6) and (x > 4)

True

In [178]:
x != 4

True

In [179]:
5 in [3, 4, 5]

True

In [180]:
'ell' in 'Hello'

True

In [181]:
len('Hello') >= 5

True

#### if tests

In [182]:
if 'd' in 'abc':
    print('Learn your alphabet.')
elif (2 + 2 == 5):
    print('Sometimes yes.')
else:
    print('Nothing is true.')    

Nothing is true.


#### while loops

In [183]:
i = 0
while (i < 5):
    print(i)
    i += 1 # i = i + 1

0
1
2
3
4


In [184]:
i #break loop

5

#### for loops

In [185]:
for x in [0, 1, 2, 3, 4]:
    print(x**2)

0
1
4
9
16


#### Lambda and List Comprehensions
<br>
A lambda function is a small anonymous function.
A lambda function can take any number of arguments, but can only have one expression.

Here's an example of lambda that takes in three parameters and adds the first two.

In [186]:
my_function = lambda a, b, c : a + b

In [187]:
my_function(1, 2, 3)

3

<br>
Let's iterate from 0 to 999 and return the even numbers.

In [188]:
my_list = []
for number in range(0, 1000):
    if number % 2 == 0:
        my_list.append(number)
my_list[:-1]

[0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48,
 50,
 52,
 54,
 56,
 58,
 60,
 62,
 64,
 66,
 68,
 70,
 72,
 74,
 76,
 78,
 80,
 82,
 84,
 86,
 88,
 90,
 92,
 94,
 96,
 98,
 100,
 102,
 104,
 106,
 108,
 110,
 112,
 114,
 116,
 118,
 120,
 122,
 124,
 126,
 128,
 130,
 132,
 134,
 136,
 138,
 140,
 142,
 144,
 146,
 148,
 150,
 152,
 154,
 156,
 158,
 160,
 162,
 164,
 166,
 168,
 170,
 172,
 174,
 176,
 178,
 180,
 182,
 184,
 186,
 188,
 190,
 192,
 194,
 196,
 198,
 200,
 202,
 204,
 206,
 208,
 210,
 212,
 214,
 216,
 218,
 220,
 222,
 224,
 226,
 228,
 230,
 232,
 234,
 236,
 238,
 240,
 242,
 244,
 246,
 248,
 250,
 252,
 254,
 256,
 258,
 260,
 262,
 264,
 266,
 268,
 270,
 272,
 274,
 276,
 278,
 280,
 282,
 284,
 286,
 288,
 290,
 292,
 294,
 296,
 298,
 300,
 302,
 304,
 306,
 308,
 310,
 312,
 314,
 316,
 318,
 320,
 322,
 324,
 326,
 328,
 330,
 332,
 334,
 336,
 338,
 340,
 342,
 344,
 346,
 348,
 350,

<br>
Now the same thing but with list comprehension.

In [189]:
my_list = [number for number in range(0,1000) if number % 2 == 0]
my_list

[0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48,
 50,
 52,
 54,
 56,
 58,
 60,
 62,
 64,
 66,
 68,
 70,
 72,
 74,
 76,
 78,
 80,
 82,
 84,
 86,
 88,
 90,
 92,
 94,
 96,
 98,
 100,
 102,
 104,
 106,
 108,
 110,
 112,
 114,
 116,
 118,
 120,
 122,
 124,
 126,
 128,
 130,
 132,
 134,
 136,
 138,
 140,
 142,
 144,
 146,
 148,
 150,
 152,
 154,
 156,
 158,
 160,
 162,
 164,
 166,
 168,
 170,
 172,
 174,
 176,
 178,
 180,
 182,
 184,
 186,
 188,
 190,
 192,
 194,
 196,
 198,
 200,
 202,
 204,
 206,
 208,
 210,
 212,
 214,
 216,
 218,
 220,
 222,
 224,
 226,
 228,
 230,
 232,
 234,
 236,
 238,
 240,
 242,
 244,
 246,
 248,
 250,
 252,
 254,
 256,
 258,
 260,
 262,
 264,
 266,
 268,
 270,
 272,
 274,
 276,
 278,
 280,
 282,
 284,
 286,
 288,
 290,
 292,
 294,
 296,
 298,
 300,
 302,
 304,
 306,
 308,
 310,
 312,
 314,
 316,
 318,
 320,
 322,
 324,
 326,
 328,
 330,
 332,
 334,
 336,
 338,
 340,
 342,
 344,
 346,
 348,
 350,

#### Reading and Writing CSV files

<br>
Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year

<br>
1. Grouping the cars by number of cylinder, and find the average cty mpg for each group
<br>
2. Find the average hwy mpg for each class of vehicle

In [130]:
import csv

%precision 2 
input_file_path = 'C:/Users/Asus/Documents/GitHub/DA-in-Python/data/mpg.csv'

with open(input_file_path) as csvfile:
    mpg = list(csv.DictReader(csvfile))

mpg[:]

[OrderedDict([('', '1'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'auto(l5)'),
              ('drv', 'f'),
              ('cty', '18'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '2'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'manual(m5)'),
              ('drv', 'f'),
              ('cty', '21'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '3'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '2'),
              ('year', '2008'),
              ('cyl', '4'),
              ('trans', 'manual(m6)'),
              ('drv',

<br>
`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list is comprised of 234 dictionaries.

In [113]:
len(mpg)

234

<br>
`keys` gives us the column names of our csv.

In [114]:
mpg[0].keys()

odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

In [118]:
# return the unique values for the number of cylinders 
cylinders = set(d['cyl'] for d in mpg)
cylinders

{'4', '5', '6', '8'}

In [126]:
# average of cty mpg
sum(float(d['cty']) for d in mpg)/len(mpg)

16.858974358974358

In [129]:
AverCtyMpgByCyl = []
for c in cylinders:
    summpg = 0
    cntcyltype = 0
    for d in mpg:
        if d['cyl'] == c:
            summpg += float(d['cty'])
            cntcyltype += 1
    AverCtyMpgByCyl.append((c, summpg/cntcyltype))
    AverCtyMpgByCyl.sort(key=lambda x:x[0])
AverCtyMpgByCyl

[('4', 21.012345679012345),
 ('5', 20.5),
 ('6', 16.21518987341772),
 ('8', 12.571428571428571)]