# Basic manipulations
 - adding to a dataframe
 - value counts
 - assigning values to a column
 - creating a new column from old columns
 - filtering
 - loc/iloc

In [2]:
import pandas as pd
heart = pd.read_csv('../heart.csv')

## Adding to a DataFrame

Here are two rows that our engineer accidentally left out of the .csv file, expressed as a Python dictionary:

In [3]:
extra_rows = {'age': [40, 30], 'sex': [1, 0], 'cp': [0, 0], 'trestbps': [120, 130],
              'chol': [240, 200],
             'fbs': [0, 0], 'restecg': [1, 0], 'thalach': [120, 122], 'exang': [0, 1],
              'oldpeak': [0.1, 1.0], 'slope': [1, 1], 'ca': [0, 1], 'thal': [2, 3],
              'target': [0, 0]}
extra_rows

{'age': [40, 30],
 'sex': [1, 0],
 'cp': [0, 0],
 'trestbps': [120, 130],
 'chol': [240, 200],
 'fbs': [0, 0],
 'restecg': [1, 0],
 'thalach': [120, 122],
 'exang': [0, 1],
 'oldpeak': [0.1, 1.0],
 'slope': [1, 1],
 'ca': [0, 1],
 'thal': [2, 3],
 'target': [0, 0]}

How can we add this to the bottom of our dataset?

In [4]:
# Let's first turn this into a DataFrame.
# We can use the .from_dict() method.

extras = pd.DataFrame().from_dict(extra_rows)

In [5]:
# Now we just need to concatenate the two DataFrames together.
# Note the `ignore_index` parameter! We'll set that to True.

heart_augmented = pd.concat([heart, extras], ignore_index=True)

In [6]:
# Let's check the end to make sure we were successful!

heart_augmented.tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
302,57,0,1,130,236,0,0,174,0,0.0,1,1,2,0
303,40,1,0,120,240,0,1,120,0,0.1,1,0,2,0
304,30,0,0,130,200,0,0,122,1,1.0,1,1,3,0


Let's add a new column to our dataset called "test". Set all of its values to 0.

In [10]:
heart['test'] = 0

I can also add columns whose values are functions of existing columns.

How could I add a column, called 'twice_age', that is double the age column?

In [11]:
heart['twice_age'] = 2 * heart['age']

## Filtering

We can use filtering techniques to see only certain rows of our data. If we wanted to see only the rows for patients 70 years of age or older, we can simply type:

In [12]:
heart[heart['age'] >= 70]

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target,test,twice_age
25,71,0,1,160,302,0,1,162,0,0.4,2,2,2,1,0,142
60,71,0,2,110,265,1,0,130,0,0.0,2,1,2,1,0,142
129,74,0,1,120,269,0,0,121,1,0.2,2,1,2,1,0,148
144,76,0,2,140,197,0,2,116,0,1.1,1,0,2,1,0,152
145,70,1,1,156,245,0,0,143,0,0.0,2,0,2,1,0,140
151,71,0,0,112,149,0,1,125,0,1.6,1,0,2,1,0,142
225,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0,0,140
234,70,1,0,130,322,0,0,109,0,2.4,1,3,2,0,0,140
238,77,1,0,125,304,0,0,162,1,0.0,2,3,2,0,0,154
240,70,1,2,160,269,0,1,112,1,2.9,1,1,3,0,0,140


Use '&' for "and" and '|' for "or".

In [13]:
# Display the patients who are 70 or over as well as the patients whose
# trestbps score is greater than 170.

heart[(heart['age'] >= 70) | (heart['trestbps'] > 170)]

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target,test,twice_age
8,52,1,2,172,199,1,1,162,0,0.5,2,0,3,1,0,104
25,71,0,1,160,302,0,1,162,0,0.4,2,2,2,1,0,142
60,71,0,2,110,265,1,0,130,0,0.0,2,1,2,1,0,142
101,59,1,3,178,270,0,0,145,0,4.2,0,0,3,1,0,118
110,64,0,0,180,325,0,1,154,1,0.0,2,0,2,1,0,128
129,74,0,1,120,269,0,0,121,1,0.2,2,1,2,1,0,148
144,76,0,2,140,197,0,2,116,0,1.1,1,0,2,1,0,152
145,70,1,1,156,245,0,0,143,0,0.0,2,0,2,1,0,140
151,71,0,0,112,149,0,1,125,0,1.6,1,0,2,1,0,142
203,68,1,2,180,274,1,0,150,1,1.6,1,0,3,0,0,136


## .loc( ) and .iloc( )

We can use .loc( ) to get, say, the first ten values of the age and trestbps columns:

In [14]:
heart.loc[:9, ['age', 'trestbps']]

Unnamed: 0,age,trestbps
0,63,145
1,37,130
2,41,130
3,56,120
4,57,120
5,57,140
6,56,140
7,44,120
8,52,172
9,57,150


.iloc() is used for selecting locations in the DataFrame **by number**:

In [15]:
heart.iloc[3, 0]

56

In [16]:
# How would we get the same slice as just above by using .iloc() instead of .loc()?

heart.iloc[:10, [0, 3]]

Unnamed: 0,age,trestbps
0,63,145
1,37,130
2,41,130
3,56,120
4,57,120
5,57,140
6,56,140
7,44,120
8,52,172
9,57,150
