### Chapter 1 - Fundamentals of Data Manipulation with Python

# Table of Contents

1.1 Introduction to the Course

1.2 Fundamentals of Data Manipulation

Quiz 1

Assignment 1


# 1.1 Introduction to the Course

## 1. Functions

In [3]:
for i in range (0, 1000):
    m = i  * i - 1
    
print(m)

998000


In [4]:
def add(x, y) :
    return(x + y)

add(1, 1)

2

In [5]:
[1, 2] + [3, 4]

[1, 2, 3, 4]

In [6]:
1 in [1, 2, 3]

True

In [8]:
range(5)

range(0, 5)

In [9]:
x = {'Christopher': 123, 'Bill Gates': 456}
x['Christopher']

123

In [10]:
x['Kevyn Collins-Thompson'] = None
x

{'Christopher': 123, 'Bill Gates': 456, 'Kevyn Collins-Thompson': None}

In [11]:
for name in x:
    print(x[name])

123
456
None


In [12]:
for email in x.values():
    print(email)

123
456
None


In [13]:
for name, email in x.items():
    print(name)
    print(email)

Christopher
123
Bill Gates
456
Kevyn Collins-Thompson
None


## 2. Types and Sequences

In [15]:
x = ('Christopher', 'Brooks', 'brooksch@umich.edu')
fname, lname, email = x
x

('Christopher', 'Brooks', 'brooksch@umich.edu')

In [17]:
fname

'Christopher'

In [18]:
x = ('Christopher', 'Brooks', 'brooksch@umich.edu', 'Ann Arbor')
fname, lname, email = x

ValueError: too many values to unpack (expected 3)

## 3. More on Strings

In [19]:
print('Chris' + 2)

TypeError: can only concatenate str (not "int") to str

In [20]:
print('Chris' + str(2))

Chris2


In [21]:
sales_record = {'price': 3.24,
                'num_items': 4,
                'person': 'Chris'}

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'

print(sales_statement.format(sales_record['person'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items'] * sales_record['price']
                            ))

Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96


## 4. Python Demonstration: Reading and Writing CSV files

#### Data Files and Summary Statistics

In [25]:
import csv

%precision 2                             # floating point precision for printing to 2

with open('mpg.csv') as csvfile :        # read in 'mpg.csv' using csv.DicReader
    mpg = list(csv.DictReader(csvfile))  # convert it to a list of dictionaries
    
mpg[ :3]

[OrderedDict([('mpg', '18'),
              ('cylinders', '8'),
              ('displacement', '307'),
              ('horsepower', '130'),
              ('weight', '3504'),
              ('acceleration', '12'),
              ('model_year', '70'),
              ('origin', '1'),
              ('name', 'chevrolet chevelle malibu')]),
 OrderedDict([('mpg', '15'),
              ('cylinders', '8'),
              ('displacement', '350'),
              ('horsepower', '165'),
              ('weight', '3693'),
              ('acceleration', '11.5'),
              ('model_year', '70'),
              ('origin', '1'),
              ('name', 'buick skylark 320')]),
 OrderedDict([('mpg', '18'),
              ('cylinders', '8'),
              ('displacement', '318'),
              ('horsepower', '150'),
              ('weight', '3436'),
              ('acceleration', '11'),
              ('model_year', '70'),
              ('origin', '1'),
              ('name', 'plymouth satellite')])]

In [26]:
len(mpg)   # 398 dictionaries in a list

398

In [27]:
mpg[0].keys()

odict_keys(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'name'])

In [30]:
sum(float(d['cylinders']) for d in mpg) / len(mpg) # average number of cylinders

5.454773869346734

In [34]:
cyl = set(d['cylinders'] for d in mpg)  # set of the unique values in the cylinder entry of dictionaries
cyl

{'3', '4', '5', '6', '8'}

## 5. Python Dates and Times

In [35]:
# Get the current time
import datetime as dt
import time as tm

tm.time()

1614346517.652697

In [37]:
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

datetime.datetime(2021, 2, 26, 22, 36, 26, 490147)

In [38]:
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second

(2021, 2, 26, 22, 36, 26)

In [41]:
dt.date.today() - dt.timedelta(days = 100)

datetime.date(2020, 11, 18)

## 6. Python Objects, map()

In [None]:
class Person :
    department = 'School of Information'
    
    def set_name(self, new_name) :
        self.name = new_name
    
    def selt_location(self, new_location) :
        self.location = new_location

- **map(function, iterable, ...)** function
  - Return an iterator that applies function to every item of iterable, yielding the results

In [44]:
store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]

cheapest = map(min, store1, store2)
cheapest # lazy evaluation: python returns a map object

<map at 0x7fe2b61389d0>

In [45]:
list(cheapest)

[9.0, 11.0, 12.34, 2.01]

Example:

In [47]:
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson']

def split_title_lname(person) :
    title = person.split()[0]
    lastname = person.split()[-1]
    
    return '{}{}'.format(title, lastname)

list(map(split_title_lname, people))

['Dr.Brooks', 'Dr.Collins-Thompson']

## 7. Advanced Python Lamnda and List Comprehension

#### Lambda

- Lamda's are Python's way of creating anonymous functions
- Same as other functions, but they have no name
- Much more limited than full function definitions
- but they are very useful for simple little data cleaning tasks

In [49]:
my_function = lambda a, b, c : a + b

my_function(1, 2, 3)

3

Example:

In [56]:
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson']

def split_title_lname(person) :
    return person.split()[0] + ' ' + person.split()[-1]

# option 1
for person in people :
    print(split_title_lname(person) == (lambda x: x.split()[0] + ' ' + x.split()[-1])(person))
    
# option 2
list(map(split_title_lname, people)) == list(map(lambda person: person.split()[0] + ' ' + person.split()[-1], people))

True
True


True

#### List Comprehensions

In [62]:
# Create a list of numbers from 0 to 999 that is divisible by 2
my_list = []

for number in range(0, 1000) :
    
    if number % 2 == 0 :
        my_list.append(number)

my_list[ :5]

[0, 2, 4, 6, 8]

In [63]:
my_list = [number for number in range(0, 1000) if number % 2 == 0]
my_list[ :5]

[0, 2, 4, 6, 8]

# 1. 2 Fundamentals of Data Manipulation

## 1. Numerical Python Libarary (NumPy)