### Pandas:  creating dataframes

One common way would be to read a csv file.  Here, we create the file first.

In [147]:
import pandas as pd

In [148]:
s = '''
A,B,C
a,1,1.4142
b,4,1.6180
c,9,2.7183
d,16,3.1415
'''

fn = 'tmp'
with open(fn,'w') as fh:  fh.write(s)
df = pd.read_csv(fn)
df

Unnamed: 0,A,B,C
0,a,1,1.4142
1,b,4,1.618
2,c,9,2.7183
3,d,16,3.1415


One can also read in a dictionary.

In [149]:
D = {'fips':['16','41','53'],'abbrev':['ID','OR','WA']}
df2 = pd.DataFrame(D)
df2

Unnamed: 0,fips,abbrev
0,16,ID
1,41,OR
2,53,WA


Note that in the first example, we could have used the `io` module to treat the string as a file-like object:

In [150]:
import io
df3 = pd.read_csv(io.StringIO(s), sep=',')
df3

Unnamed: 0,A,B,C
0,a,1,1.4142
1,b,4,1.618
2,c,9,2.7183
3,d,16,3.1415


In [151]:
#### Copies:  assignment does **not** make a copy!

In [152]:
df4 = df
print(id(df),id(df4))

4944950816 4944950816


In [153]:
df4 = df.drop('B',axis=1)

In [154]:
print(id(df),id(df4))

4944950816 4945150704


Pandas protects us by returning the result from `drop` as a new dataframe.

In [155]:
df4

Unnamed: 0,A,C
0,a,1.4142
1,b,1.618
2,c,2.7183
3,d,3.1415


### Pandas:  working with columns

The fundamental idea is to construct selections from a dataframe by matching the values in a given column with a list of booleans.

In [156]:
L = [True,False,True,False]
df[L]

Unnamed: 0,A,B,C
0,a,1,1.4142
2,c,9,2.7183


Normally the way you obtain a list of booleans is by matching the values in one or more columns against some filter.  We'll break this up into two steps by first making a selector.  It's usually done in one step.

In [157]:
sel = df['A'].isin(list('abd'))
sel

0     True
1     True
2    False
3     True
Name: A, dtype: bool

In [158]:
print(type(sel))

<class 'pandas.core.series.Series'>


In [159]:
df[sel]

Unnamed: 0,A,B,C
0,a,1,1.4142
1,b,4,1.618
3,d,16,3.1415


In [160]:
df[(df['B'] == 4) | (df['C'] > 2)]

Unnamed: 0,A,B,C
1,b,4,1.618
2,c,9,2.7183
3,d,16,3.1415


The symbols `&` and `|` are used rather than `and` and `or`.

For complex searchs one could use `apply`, usually with a lambda expression, although it could be a named function.

In [161]:
def f(row):
    v = row['B']
    return v in range(5)

sel = df.apply(f,axis=1)

In [162]:
df[sel]

Unnamed: 0,A,B,C
0,a,1,1.4142
1,b,4,1.618


In [163]:
import numpy as np
np.random.seed(153)

L = np.random.randint(0,10,20)
a = np.array(L)
a.shape=(4,5)
a

array([[0, 8, 7, 1, 7],
       [9, 6, 3, 6, 6],
       [0, 2, 4, 2, 6],
       [9, 4, 8, 2, 1]])

In [164]:
df = pd.DataFrame(a,columns = list('ABCDE'))
df1 = df[['D','B']]
df1

Unnamed: 0,D,B
0,1,8
1,6,6
2,2,2
3,2,4


Double brackets with column names returns those columns in the specified order.

In [165]:
df['F'] = [1,2,3,4]
df

Unnamed: 0,A,B,C,D,E,F
0,0,8,7,1,7,1
1,9,6,3,6,6,2
2,0,2,4,2,6,3
3,9,4,8,2,1,4


### Pandas:  working with rows

In [166]:
df

Unnamed: 0,A,B,C,D,E,F
0,0,8,7,1,7,1
1,9,6,3,6,6,2
2,0,2,4,2,6,3
3,9,4,8,2,1,4


In [167]:
df.iloc[0,3]

np.int64(1)

This does *not* do what we might want (to select rows 0 and 3).  Instead it returns the value at row 0, column 3.

In [168]:
df.iloc[[0,3]]

Unnamed: 0,A,B,C,D,E,F
0,0,8,7,1,7,1
3,9,4,8,2,1,4


In [169]:
df.iloc[[0,3]][['E','A']]

Unnamed: 0,E,A
0,7,0
3,1,9


Another way to access a single value:

In [170]:
df.at[0,'B']

np.int64(8)

Iteration

In [171]:
for i,r in df.iterrows():
    print(r['B'])

8
6
2
4
