Consider the following Python dictionary `data` and Python list `labels`:

``` python
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```

**1.** Create a DataFrame `df` from this dictionary `data` which has the index `labels`.

In [2]:
import pandas as pd
import numpy as np

data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data, index=labels)

**2.** Display a summary of the basic information about this DataFrame and its data (*hint: there is a single method that can be called on the DataFrame*).

In [None]:
df.info()

**3.** Return the first 3 rows of the DataFrame `df`.

In [None]:
first_3_rows = df.head(3)

**4.** Display the 'animal' and 'age' columns from the DataFrame `df`

In [3]:
selected_columns = df[['animal', 'age']]

**5.** Display the data in rows `[3, 4, 8]` *and* in columns `['animal', 'age']'

In [4]:
selected_data = df.loc[[3, 4, 8], ['animal', 'age']]

**6.** Select only the rows where the number of visits is greater than 3.

In [None]:
selected_rows = df[df['visits'] > 3]

**7.** Select the rows where the age is missing, i.e. it is `NaN`.

In [None]:
missing_age_rows = df[pd.isna(df['age'])]


**8.** Select the rows where the animal is a cat *and* the age is less than 3.

In [None]:
selected_rows = df[(df['animal'] == 'cat') & (df['age'] < 3)]

**9.** Select the rows where the age is between 2 and 4 (inclusive)

In [5]:
selected_rows = df[(df['age'] >= 2) & (df['age'] <= 4)]

**10.** Change the age in row 'f' to 1.5.

In [None]:
df.loc['f', 'age'] = 1.5

**11.** Calculate the sum of all visits in `df` (i.e. the total number of visits).

In [None]:
total_visits = df['visits'].sum()

**12.** Calculate the mean age for each different animal in `df`.

In [None]:
mean_age_by_animal = df.groupby('animal')['age'].mean()

**13.** Append a new row 'k' to `df` with your choice of values for each column. Then delete that row to return the original DataFrame.

In [None]:
# Append a new row 'k' to df
df.loc['k'] = ['parrot', 2.0, 2, 'no']

# Now df contains the appended row

# To delete the added row and return the original DataFrame:
df = df.drop('k')

**14.** Count the number of each type of animal in `df`.

In [None]:
animal_counts = df['animal'].value_counts()

**15.** Sort `df` first by the values in the 'age' in *decending* order, then by the value in the 'visits' column in *ascending* order (so row `i` should be first, and row `d` should be last).

In [None]:
df_sorted = df.sort_values(by=['age', 'visits'], ascending=[False, True])

**16.** The 'priority' column contains the values 'yes' and 'no'. Replace this column with a column of boolean values: 'yes' should be `True` and 'no' should be `False`.

In [None]:
df['priority'] = df['priority'].replace({'yes': True, 'no': False})

**17.** In the 'animal' column, change the 'snake' entries to 'python'.

In [None]:
df['animal'] = df['animal'].replace('snake', 'python')

**18.** Load the ny-flights dataset to Python

In [None]:
import seaborn as sns

# Load the ny-flights dataset
df = sns.load_dataset('flights')

# Now, 'df' contains the "ny-flights" dataset
pip install seaborn

**19.** Which airline ID is present maximum times in the dataset

In [None]:
import pandas as pd
import seaborn as sns

# Load the ny-flights dataset
df = sns.load_dataset('flights')

# Group by 'airline_id' and count occurrences
airline_counts = df['airline_id'].value_counts()

# Get the airline ID with the maximum count
max_airline_id = airline_counts.idxmax()

print("Airline ID with maximum occurrences:", max_airline_id)

**20.** Draw a plot between dep_delay and arr_delay

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Load the ny-flights dataset
df = sns.load_dataset('flights')

# Create a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df['dep_delay'], df['arr_delay'], alpha=0.5)
plt.title('Departure Delay vs Arrival Delay')
plt.xlabel('Departure Delay (minutes)')
plt.ylabel('Arrival Delay (minutes)')
plt.grid(True)
plt.show()