# 100 pandas puzzles

## Importing pandas

### Getting started and checking your pandas setup
Difficulty:easy

**1**. Import pandas under the alias pd.

In [1]:
import pandas as pd

**2**. Print the version of pandas that has been imported.

In [2]:
print(pd.__version__)

2.2.3


**3**. Print out all the version information of the libraries that are required by the pandas library.

In [7]:
!pipdeptree -p pandas

pandas==2.2.3
├── numpy [required: >=1.26.0, installed: 1.26.4]
├── python-dateutil [required: >=2.8.2, installed: 2.9.0.post0]
│   └── six [required: >=1.5, installed: 1.17.0]
├── pytz [required: >=2020.1, installed: 2024.2]
└── tzdata [required: >=2022.7, installed: 2025.1]


## DataFrame basics

### A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrames
Difficulty: easy

    Note: remember to import numpy using:

    import numpy as np
    Consider the following Python dictionary data and Python list labels:

    data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
            'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
            'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
            'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

    labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
    (This is just some meaningless data I made up with the theme of animals and trips to a vet.)

**4**. Create a DataFrame df from this dictionary data which has the index labels.

In [10]:
import numpy as np

In [11]:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [216]:
df = pd.DataFrame.from_dict(data, orient='index', columns=labels)
df

Unnamed: 0,a,b,c,d,e,f,g,h,i,j
animal,cat,cat,snake,dog,dog,cat,snake,cat,dog,dog
age,2.5,3,0.5,,5,2,4.5,,7,3
visits,1,3,2,3,2,3,1,1,2,1
priority,yes,yes,no,yes,no,no,no,yes,no,no


**5**. Display a summary of the basic information about this DataFrame and its data (hint: there is a single method that can be called on the DataFrame).

In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, animal to priority
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       4 non-null      object
 1   b       4 non-null      object
 2   c       4 non-null      object
 3   d       3 non-null      object
 4   e       4 non-null      object
 5   f       4 non-null      object
 6   g       4 non-null      object
 7   h       3 non-null      object
 8   i       4 non-null      object
 9   j       4 non-null      object
dtypes: object(10)
memory usage: 352.0+ bytes


In [316]:
df2 = pd.DataFrame(data, index=labels)
df2

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,2.0,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


In [199]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   animal    10 non-null     object 
 1   age       8 non-null      float64
 2   visits    10 non-null     int64  
 3   priority  10 non-null     object 
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes


**6**. Return the first 3 rows of the DataFrame df.

In [200]:
df2.head(3)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no


In [201]:
df2[0:3]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no


**7**. Select just the 'animal' and 'age' columns from the DataFrame df.

In [232]:
df2[ ["animal", "age"] ]

Unnamed: 0,animal,age
a,cat,2.5
b,cat,3.0
c,snake,0.5
d,dog,
e,dog,5.0
f,cat,1.5
g,snake,4.5
h,cat,
i,dog,7.0
j,dog,3.0


In [258]:
df2.loc[:,['animal', 'age']]

Unnamed: 0,animal,age
a,cat,2.5
b,cat,3.0
c,snake,0.5
d,dog,
e,dog,5.0
f,cat,1.5
g,snake,4.5
h,cat,
i,dog,7.0
j,dog,3.0


**8**. Select the data in rows [3, 4, 8] and in columns ['animal', 'age'].

In [235]:
df2.loc[['c','d','h'],['animal', 'age']]

Unnamed: 0,animal,age
c,snake,0.5
d,dog,
h,cat,


**9**. Select only the rows where the number of visits is greater than 3.

In [207]:
df2[ df2.visits > 3]

Unnamed: 0,animal,age,visits,priority


In [208]:
df2[lambda x: x.visits > 3]

Unnamed: 0,animal,age,visits,priority


**10**. Select the rows where the age is missing, i.e. it is NaN.

In [209]:
df2[ df2.age.isna()]

Unnamed: 0,animal,age,visits,priority
d,dog,,3,yes
h,cat,,1,yes


In [210]:
df2[ lambda x: x.age.isna() ]

Unnamed: 0,animal,age,visits,priority
d,dog,,3,yes
h,cat,,1,yes


**11**. Select the rows where the animal is a cat and the age is less than 3.

In [211]:
df2[ (df2.animal == 'cat') & (df2.age < 3) ]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
f,cat,2.0,3,no


**12**. Select the rows the age is between 2 and 4 (inclusive).

In [212]:
df2[ (df2.age >=2) & (df2.age <=4) ]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
f,cat,2.0,3,no
j,dog,3.0,1,no


In [213]:
df2[ df2.age.between(2,4, inclusive='both') ]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
f,cat,2.0,3,no
j,dog,3.0,1,no


**13**. Change the age in row 'f' to 1.5.

In [252]:
df2.loc['f', 'age'] = 1.5
df2

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**14**. Calculate the sum of all visits in df (i.e. find the total number of visits).

In [251]:
df2.visits.sum()

19

**15**. Calculate the mean age for each different animal in df.

In [253]:
df2.groupby('animal').age.mean()

animal
cat      2.333333
dog      5.000000
snake    2.500000
Name: age, dtype: float64

**16**. Append a new row 'k' to df with your choice of values for each column. Then delete that row to return the original DataFrame.

In [263]:
new_row = pd.DataFrame({'animal':'fig', 'age':10.0, 'visits':1, 'priority':'yes'}, index=['k'])
new_row

Unnamed: 0,animal,age,visits,priority
k,fig,10.0,1,yes


In [268]:
pd.concat([df2, new_row], ignore_index=False)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


In [269]:
pd.concat([df2, new_row], ignore_index=False).drop('k')

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**17**. Count the number of each type of animal in df.

In [283]:
df2.groupby('animal').animal.count()

animal
cat      4
dog      4
snake    2
Name: animal, dtype: int64

In [288]:
df2.groupby('animal').agg({'animal':'count'})

Unnamed: 0_level_0,animal
animal,Unnamed: 1_level_1
cat,4
dog,4
snake,2


**18**. Sort df first by the values in the 'age' in decending order, then by the value in the 'visits' column in ascending order (so row i should be first, and row d should be last).

In [292]:
df2.sort_values('visits').sort_values(by='age', ascending=False)

Unnamed: 0,animal,age,visits,priority
i,dog,7.0,2,no
e,dog,5.0,2,no
g,snake,4.5,1,no
j,dog,3.0,1,no
b,cat,3.0,3,yes
a,cat,2.5,1,yes
f,cat,1.5,3,no
c,snake,0.5,2,no
h,cat,,1,yes
d,dog,,3,yes


In [294]:
df2.sort_values(by=['age', 'visits'], ascending=[False,True])

Unnamed: 0,animal,age,visits,priority
i,dog,7.0,2,no
e,dog,5.0,2,no
g,snake,4.5,1,no
j,dog,3.0,1,no
b,cat,3.0,3,yes
a,cat,2.5,1,yes
f,cat,1.5,3,no
c,snake,0.5,2,no
h,cat,,1,yes
d,dog,,3,yes


**19**. The 'priority' column contains the values 'yes' and 'no'. Replace this column with a column of boolean values: 'yes' should be True and 'no' should be False.

In [322]:
df2.priority = df2.priority.map( lambda x:
    x == 'yes'
)
df2

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,True
b,cat,3.0,3,True
c,snake,0.5,2,False
d,dog,,3,True
e,dog,5.0,2,False
f,cat,2.0,3,False
g,snake,4.5,1,False
h,cat,,1,True
i,dog,7.0,2,False
j,dog,3.0,1,False


**20**. In the 'animal' column, change the 'snake' entries to 'python'.

In [324]:
df2

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,True
b,cat,3.0,3,True
c,snake,0.5,2,False
d,dog,,3,True
e,dog,5.0,2,False
f,cat,2.0,3,False
g,snake,4.5,1,False
h,cat,,1,True
i,dog,7.0,2,False
j,dog,3.0,1,False


In [323]:
def convAnimal(animal):
    if animal == 'snake':
        return 'python'
    else:
        return animal

df3 = df2.copy()

df3.animal = df2.animal.apply( lambda x:
    convAnimal(x)
)
df3

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,True
b,cat,3.0,3,True
c,python,0.5,2,False
d,dog,,3,True
e,dog,5.0,2,False
f,cat,2.0,3,False
g,python,4.5,1,False
h,cat,,1,True
i,dog,7.0,2,False
j,dog,3.0,1,False
