Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.
definitions below
mpg : miles per gallon
class : car classification
cty : city mpg
cyl : # of cylinders
displ : engine displacement in liters
drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
hwy : highway mpg
manufacturer : automobile manufacturer
model : model of car
trans : type of transmission
year : model year

In [19]:
import csv

%precision 2 #number of decimal places

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))
mpg[:3]

[OrderedDict([('', '1'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'auto(l5)'),
              ('drv', 'f'),
              ('cty', '18'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '2'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'manual(m5)'),
              ('drv', 'f'),
              ('cty', '21'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '3'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '2'),
              ('year', '2008'),
              ('cyl', '4'),
              ('trans', 'manual(m6)'),
              ('drv',

csv.Dictreader has read in each row of our csv as a dictionary. len shows that our list comprised of 234 dictionaries. 

In [20]:
len(mpg)

234

In [21]:
mpg[0].keys()

odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

keys gives us the column names of our csv.

This is how to find the average city fuel economy across all cars. All values in the dictionaries are strings, so we need to convert the float. d below referencing dictionary key

In [23]:
sum(float(d['cty']) for d in mpg) / len(mpg)

16.858974358974358

Similarly this is how to find the average hwy fuel economy across all cars.

In [25]:
sum(float(d['hwy']) for d in mpg) / len(mpg)

23.44017094017094

Use set to return the unique values for the number of cylinders the cars in our dataset have.The set() function creates a set object. The items in a set list are unordered, so it will appear in random order. 

In [27]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

{'4', '5', '6', '8'}

Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [28]:
CtyMpgByCyl = [] # creating an open list

for c in cylinders: # iterate over all the cylinder levels
    summpg = 0 # starts counts at zero
    cyltypecount = 0
    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count
    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')

CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl

[('4', 21.012345679012345),
 ('5', 20.5),
 ('6', 16.21518987341772),
 ('8', 12.571428571428571)]

for statement iterates over the members of a sequence in order, executing the block each time. Contrast the for statement with the ''while'' loop, used when a condition needs to be checked each iteration, or to repeat a block of code forever.
Lambda In Python, lambda is a keyword used to define anonymous functions(functions with no name) and that's why they are known as lambda functions.

Basically it is used for defining anonymous functions that can/can't take argument(s) and returns value of data/expression.

Use set to return unique values for the class types in our dataset.

In [30]:
vehicleclass = set(d['class'] for d in mpg) # what are the class types
vehicleclass

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

In [31]:
manufacturerclass = set(d['manufacturer'] for d in mpg) # what are the manufacturers
manufacturerclass

{'audi',
 'chevrolet',
 'dodge',
 'ford',
 'honda',
 'hyundai',
 'jeep',
 'land rover',
 'lincoln',
 'mercury',
 'nissan',
 'pontiac',
 'subaru',
 'toyota',
 'volkswagen'}

And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.

In [1]:
HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
    summpg = 0
    vclasscount = 0
    for d in mpg: # iterate over all dictionaries
        if d['class'] == t: # if the cylinder amount type matches,
            summpg += float(d['hwy']) # add the hwy mpg
            vclasscount += 1 # increment the count
    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1]) #sorts lowest to highest
HwyMpgByClass

NameError: name 'vehicleclass' is not defined