This material should help you get the ideas clearer from the first meeting:

In [1]:
names=["Tomás", "Pauline", "Pablo", "Bjork","Alan","Juana"]
woman=[False,True,False,False,False,True]
ages=[32,33,28,30,32,27]
country=["Chile", "Senegal", "Spain", "Norway","Peru","Peru"]
education=["Bach", "Bach", "Master", "PhD","Bach","Master"]

# now in a dict:
data={'name':names, 'age':ages, 'girl':woman,'born In':country, 'degree':education}

#now into a DF
import pandas as pd

friends=pd.DataFrame.from_dict(data)
# seeing it:
friends

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,33,True,Senegal,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach
5,Juana,27,True,Peru,Master


The result is what you expected, but you need to be sure of what data structure you have:

In [2]:
#IT IS GOOD TO KNOW WHAT DO YOU HAVE

#what is it?
type(friends)

pandas.core.frame.DataFrame

In [3]:
#this is good
friends.age

0    32
1    33
2    28
3    30
4    32
5    27
Name: age, dtype: int64

In [9]:
#what is it?
type(friends.age)

#series is a column in pandas

pandas.core.series.Series

In [5]:
#this is good
friends['age']

#you can call something (series in pandas) by . or by ''; sometimes using [''] is better because names can have a special charcter (dot . for example) or a space in between and this can be confusing for Python.

0    32
1    33
2    28
3    30
4    32
5    27
Name: age, dtype: int64

In [10]:
#what is it?
type(friends['age'])

pandas.core.series.Series

In [7]:
#this is bad
friends.iloc[['age']]

ValueError: invalid literal for int() with base 10: 'age'

In [8]:
#this is bad
friends.loc[['age']]

KeyError: "None of [['age']] are in the [index]"

In [11]:
#this is bad
friends['age','born In']

KeyError: ('age', 'born In')

In [12]:
#this is good
friends[['age','born In']]

Unnamed: 0,age,born In
0,32,Chile
1,33,Senegal
2,28,Spain
3,30,Norway
4,32,Peru
5,27,Peru


In [13]:
# what is it?
type(friends[['age','born In']])

pandas.core.frame.DataFrame

In [14]:
#this is bad
friends.'born In'

SyntaxError: invalid syntax (<ipython-input-14-a1aa66ff4520>, line 2)

In [15]:
#this is good
friends.loc[:,['age','born In']]

#with loc function we are calling [rows and collumns]
#: means every row

Unnamed: 0,age,born In
0,32,Chile
1,33,Senegal
2,28,Spain
3,30,Norway
4,32,Peru
5,27,Peru


In [16]:
type(friends.loc[:,['age','born In']])

pandas.core.frame.DataFrame

In [17]:
#this is bad
friends.loc[:,['age':'born In']]

SyntaxError: invalid syntax (<ipython-input-17-da35a4e2500b>, line 2)

In [18]:
#this is bad
friends.iloc[:,['age','born In']]

#loc uses names, iloc uses index or a position

TypeError: cannot perform reduce with flexible type

In [20]:
# this is good (but different)
friends.iloc[:,1:4]

Unnamed: 0,age,girl,born In
0,32,False,Chile
1,33,True,Senegal
2,28,False,Spain
3,30,False,Norway
4,32,False,Peru
5,27,True,Peru


In [21]:
# what is it?
type(friends.iloc[:,1:4])

pandas.core.frame.DataFrame

In [22]:
# this is good
friends.iloc[:,[1,3]]

Unnamed: 0,age,born In
0,32,Chile
1,33,Senegal
2,28,Spain
3,30,Norway
4,32,Peru
5,27,Peru


In [23]:
#what is it?
type(friends.iloc[:,[1,3]])

pandas.core.frame.DataFrame

Most of our operations are done on Data frames, because they have several columns and we used that for the subsetting:

In [None]:
friends[friends.age>30] #friends given the condition

Some people like coding with the filter language:

In [24]:
# 
filter1=friends.age>30
friends[filter1]

#classical programming

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,33,True,Senegal,Bach
4,Alan,32,False,Peru,Bach


In [25]:
friends.where(filter1)

#where creates missing values if something doesn't meet a condition; same filter

#same syntaxis, but more English version with a better readability

#() and where use the same structure of filter, so the meaning is also the same

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32.0,0.0,Chile,Bach
1,Pauline,33.0,1.0,Senegal,Bach
2,,,,,
3,,,,,
4,Alan,32.0,0.0,Peru,Bach
5,,,,,


In [26]:
filter1a='age>30' #another filter but the logic is the same
friends.query(filter1a)

#query uses another filter -> it should be specified as text; also, query doesnt use the name of the dataframe
#query works for the subset; it workswith the last dataframe in memory

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,33,True,Senegal,Bach
4,Alan,32,False,Peru,Bach


In [27]:
isinstance(friends[filter1], pd.DataFrame), \
isinstance(friends.where(filter1), pd.DataFrame), \
isinstance(friends.query(filter1a), pd.DataFrame)

#\ means it should be in one line

(True, True, True)

When you have Boolean values (True/False) you can simplify:

In [28]:
#from:
friends[friends.girl==False]

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach


In [29]:
# to...
friends[~friends.girl]

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach


You can have two filters:

In [30]:
#sometimes you need to make a subset and then make a query, but sometimes you can make 2 filters/conditions at the same time

# this will not work
friends[~friends.girl & friends.degree=='Bach']

  result = method(y)


TypeError: invalid type comparison

In [31]:
# this will (with parentheses)
friends[(~friends.girl) & (friends.degree=='Bach')]

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
4,Alan,32,False,Peru,Bach


Other times you want a values once a filter was applied:

In [32]:
# youngest male:
friends[(~friends.girl) & (friends.age.min())] # this is wrong! - min is not. filter, its just a text of a function

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach


In [34]:
friends[(~friends.girl) & (friends.age==friends.age.min())] # this is wrong too! - min is a filter, but it takes min out of a wrong subset. min in the whole DF is woman, thats why it shows nothing => wrong filter

Unnamed: 0,name,age,girl,born In,degree


In [35]:
friends.age.min()

27

You got empty answer because there is no man aged 27.

In [38]:
# this is correct
friends[~friends.girl].age.min() 

#age is operating on the previous operation in this case df, then min() operates on the result of the age function

AttributeError: 'numpy.int64' object has no attribute 'name'

Once you know the right age, you have to put it in the right place:

In [43]:
friends[friends.age==friends[~friends.girl].age.min()]

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28,False,Spain,Master


In [44]:
# or
friends.where(friends.age==friends[~friends.girl].age.min())

Unnamed: 0,name,age,girl,born In,degree
0,,,,,
1,,,,,
2,Pablo,28.0,0.0,Spain,Master
3,,,,,
4,,,,,
5,,,,,


In [None]:
# or
friends.where(friends.age==friends[~friends.girl].age.min()).dropna()

#dropna() - gets rid of the rows (complete rows) with the missing values. 

The problem is that 'friends' are not subset and the age keeps being that of the youngest woman:

In [45]:
# bad:
friends.where(~friends.girl).where(friends.age==friends.age.min())

#2nd where works witht he whole friends subset, not on the boys subset

Unnamed: 0,name,age,girl,born In,degree
0,,,,,
1,,,,,
2,,,,,
3,,,,,
4,,,,,
5,,,,,


That's the advantage of **query**:

In [46]:
friends.query('~girl').query('age==age.min()')

#advantage of the query is that it works with the last subset to the left!!!

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28,False,Spain,Master


In [50]:
#but

students=friends.copy()  #students just got data from friends, but they are not the same

students.where(~students.girl,inplace=True) #real subset 
students.where(students.age==students.age.min())

#inplace=True - after you do the opertion change the original dataframe into a new one; 
#so now we work with a new subset of "students"


Unnamed: 0,name,age,girl,born In,degree
0,,,,,
1,,,,,
2,Pablo,28.0,0.0,Spain,Master
3,,,,,
4,,,,,
5,,,,,


Let's vary the data a little:

In [51]:
names=["Tomás", "Pauline", "Pablo", "Bjork","Alan","Juana"]
woman=[False,True,False,False,False,True]
ages=[32,28,28,30,32,27]
country=["Chile", "Senegal", "Spain", "Norway","Peru","Peru"]
education=["Bach", "Bach", "Master", "PhD","Bach","Master"]

# now in a dict:
data={'name':names, 'age':ages, 'girl':woman,'born In':country, 'degree':education}

#now into a DF
import pandas as pd

friends2=pd.DataFrame.from_dict(data)
# seeing it:
friends2

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,28,True,Senegal,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach
5,Juana,27,True,Peru,Master


There is a girl with the same age as the youngest boy, then:

In [52]:
friends2.where(friends2.age==friends2[~friends2.girl].age.min()).dropna()

Unnamed: 0,name,age,girl,born In,degree
1,Pauline,28.0,1.0,Senegal,Bach
2,Pablo,28.0,0.0,Spain,Master


We need a previous strategy:

In [53]:
# bad implementation:
friends2.where(friends2.age==friends2[~friends2.girl].age.min() & friends2.girl==False).dropna()

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [54]:
# bad implementation:
friends2.where(friends2.age==friends2[~friends2.girl].age.min() & ~friends2.girl).dropna()

Unnamed: 0,name,age,girl,born In,degree


In [55]:
# just parentheses to make it work!
friends2.where((friends2.age==friends2[~friends2.girl].age.min()) & (~friends2.girl)).dropna()

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28.0,0.0,Spain,Master


This one still works!

In [61]:
friends2.query('~girl').query('age==age.min()')

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28,False,Spain,Master


In [59]:
students2=friends2.copy()

students2.where(~students2.girl,inplace=True) #real subset
students2.where(students2.age==students2.age.min()).dropna()


Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28.0,0.0,Spain,Master
