##### Q1. List any five functions of the pandas library with execution.

Here are five common functions of the pandas library.

1. read_csv(): This function is used to read data from a CSV file and create a DataFrame object.

2. head(): It returns the first n rows of a DataFrame. By default, it returns the first 5 rows.

3. describe(): It generates descriptive statistics of a DataFrame, including count, mean, standard deviation, minimum, maximum, and quartiles.

4. groupby(): It is used to split the data into groups based on a specific criterion and perform operations on those groups.

5. to_csv(): This function is used to write the contents of a DataFrame to a CSV file.

##### Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [27]:
import pandas as pd

def reIndex(df):
    df.index = range(1, 2 * len(df) + 1, 2)
    return df

data = {
    'A' : [1,2,3,4,5],
    'B': [1, 2, 3, 4, 5],
    'C': ['a', 'b', 'c', 'd', 'e']
}
df = pd.DataFrame(data)

indexed = reIndex(df)

print(indexed)

   A  B  C
1  1  1  a
3  2  2  b
5  3  3  c
7  4  4  d
9  5  5  e


##### Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

In [None]:
def add(df):
    s = df['Values'][0:3].sum()
    return s

d = {
    'Values' : [10, 20, 30, 40, 50]
    }

df = pd.DataFrame(d)

calc = add(df)

print(f"The sum of first three values are: {calc}")

##### Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [61]:
def new_Column(df):
    
    # df['Word_Count'] = df['Text'].str.split().str.len()
    df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))
    return df

data = {
    'Text' : ['Hello, how are you?',
              'I am doing great.',
              'Python is awesome!'
             ]
        }

df = pd.DataFrame(data)

cols = new_Column(df)

print(cols)

                  Text  Word_Count
0  Hello, how are you?           4
1    I am doing great.           4
2   Python is awesome!           3


##### Q5. How are DataFrame.size() and DataFrame.shape() different?

The `DataFrame.size` and `DataFrame.shape` are both methods in Pandas that provide information about the dimensions of a DataFrame, but they differ in terms of the information they provide.

1. `DataFrame.size`:
   - The `DataFrame.size` method returns the total number of elements in the DataFrame.
   - It represents the product of the number of rows and the number of columns in the DataFrame.
   - The returned value is an integer representing the size of the DataFrame.

2. `DataFrame.shape`:
   - The `DataFrame.shape` attribute returns a tuple containing the dimensions of the DataFrame.
   - It provides the number of rows followed by the number of columns in the DataFrame.
   - The returned value is a tuple in the format `(rows, columns)`.

##### Q6. Which function of pandas do we use to read an excel file?

In pandas, we use the `pandas.read_excel()` function to read an Excel file. This function allows us to read the data from an Excel file into a pandas DataFrame.

##### Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

In [90]:
def em_ids(df):
    
    # The lambda function splits each email address using split('@') and retrieves the first part, representing the username.
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

data = {
        'Email': ['abdul@example.com',
                  'john.doe@example.com',
                  'ubaid@example.com']
        }

df = pd.DataFrame(data)

ids = em_ids(df)

print(ids)

                  Email  Username
0     abdul@example.com     abdul
1  john.doe@example.com  john.doe
2     ubaid@example.com     ubaid


##### Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.
```
For example, if df contains the following values:
  A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2
```

In [114]:
def selected(df):
    selected_df = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_df

data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}
df = pd.DataFrame(data)
new = selected(df)
print(new)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


##### Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [120]:
def stats(df):
    mean = df['Values'].mean()
    median = df['Values'].median()
    sd = df['Values'].std()
                                 
    return mean, median, sd

val = {
    'Values' : [10, 20, 30, 40, 50]
    }

df = pd.DataFrame(val)

mean, median, sd = stats(df)

print(f"mean: {mean}\nmedian: {median}\nStandard Deviation: {sd}")

mean: 30.0
median: 30.0
Standard Deviation: 15.811388300841896


##### Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [136]:
def avg_sales(df):
    
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

data = {
        'Date': pd.date_range(start='2023-01-01', periods=15),
        'Sales' : [10, 15, 8, 20, 12, 18, 14, 22, 16, 10, 19, 25, 13, 17, 21]
        }

df = pd.DataFrame(data)
sales = avg_sales(df)
print(sales)

         Date  Sales  MovingAverage
0  2023-01-01     10      10.000000
1  2023-01-02     15      12.500000
2  2023-01-03      8      11.000000
3  2023-01-04     20      13.250000
4  2023-01-05     12      13.000000
5  2023-01-06     18      13.833333
6  2023-01-07     14      13.857143
7  2023-01-08     22      15.571429
8  2023-01-09     16      15.714286
9  2023-01-10     10      16.000000
10 2023-01-11     19      15.857143
11 2023-01-12     25      17.714286
12 2023-01-13     13      17.000000
13 2023-01-14     17      17.428571
14 2023-01-15     21      17.285714


##### Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column. For example, if df contains the following values:
```
  Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

  Date       Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.
```

In [146]:
def days(df):
    df['Weekday'] = df['Date'].dt.day_name()
    return df

data = {
        'Date' : pd.date_range(start ='2023-01-01', periods= 5)
        }

df = pd.DataFrame(data)
week_days = days(df)
print(week_days)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


##### Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [164]:
def calc(df):
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    
    date_df = df[df['Date'].between(start_date, end_date)]
    # date_df = df[(df['Date'] >= start_date) &  (df['Date'] <= end_date)]
    return date_df

data = {
        'Date' : pd.date_range(start ='2023-01-01', periods= 50) # Timestamps
        }

df = pd.DataFrame(data)
select = calc(df)
print(select)

         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31


##### Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is the `pandas` library itself. It provides the core functionality and data structures for working with structured data, such as DataFrames and Series. 

To import the `pandas` library, we can use the following import statement:
```python
import pandas as pd
```
This statement imports the pandas library and assigns it the alias `pd`, which is a common convention for working with pandas.