## Tasks

1. Calculate the share of expenditure on school education incurred by various departments/ministries. 
2. Estimate the share of capital expenditure.
3. Using projected population for each of the years under consideration, estimate the per-capita expenditure on school education in the state, and each district. 
4. Rank the districts based on utilization of allotted funds of revenue expenditure and capital expenditure (separately).

### Imports : Library and data

In [50]:
# imports
import pandas as pd
import re

In [2]:
# data files
data2017 = pd.read_csv('./data/district_level_mapping_2017.csv')
data2018 = pd.read_csv('./data/district_level_mapping_2018.csv')
data2019 = pd.read_csv('./data/district_level_mapping_2019.csv')
data2020 = pd.read_csv('./data/district_level_mapping_2020.csv')
meta = pd.read_excel('./data/Metadata.xlsx')
# for arrey operation on all files
arr = [data2017, data2018, data2019, data2020]
# adding all dataframes into one
all_data = pd.concat(arr)

  data2017 = pd.read_csv('./data/district_level_mapping_2017.csv')


### Initial Processing
(preparing the data)
- Fill NaN with 0.
- generate single column of overall expenditure.

In [3]:
# fill na values
def initial_process(arr):
    """
    This function fills NaN values with 0.0 and adds two new columns at the end of dataframe :
    1. Overall Expenditure : one value instead of three different columns
    2. Excess : Surplus left after expenditure

    It takes in a list of dataframes, single dataframes can be passed as a single item list.

    It'll return a new arrey (list) of all Dataframes now processed.
    """
    new_arr = []
    for df in arr:
        df['overall expenditure'] = 0
        df['excess'] = 0
        df.fillna(0.0)
        for i in range(len(df)):
            if df.iloc[i, 16] != 0:
                df.iloc[i, 20] = df.iloc[i, 16]
                df.iloc[i, 21] = df.iloc[i, 15] - df.iloc[i, 16]

            elif df.iloc[i, 17] != 0:
                df.iloc[i, 20] = df.iloc[i, 17]
                df.iloc[i, 21] = df.iloc[i, 15] - df.iloc[i, 17]

            elif df.iloc[i, 18] != 0:
                df.iloc[i, 20] = df.iloc[i, 18]
                df.iloc[i, 21] = df.iloc[i, 15] - df.iloc[i, 18]   
        new_arr.append(df)

    return new_arr

#create a new instance for all available data
new_arr = initial_process(arr)

### Calculate the share of expenditure on school education incurred by various departments/ministries. 
education departments : शिक्षा विभाग (प्राथमिक शिक्षा), शिक्षा विभाग (माध्यमिक शिक्षा), शिक्षा विभाग (उच्च शिक्षा), शिक्षा विभाग(राज्य शैक्षिक अनुसंधान एवं प्रशिक्षण परिषद्), व्यावसायिक शिक्षा विभाग, प्राविधिक शिक्षा विभाग, चिकित्सा विभाग (चिकित्सा, शिक्षा एवं प्रशिक्षण)

school education departments : शिक्षा विभाग (प्राथमिक शिक्षा), शिक्षा विभाग (माध्यमिक शिक्षा), शिक्षा विभाग (उच्च शिक्षा)

In [25]:
# filter dataframe with selected values that denote school education
def school_edu_Exp(arr):
    """
    This function takes in the list of dataframes and filters and sorts
    values based on classification of school education departments.

    It'll return a new arrey (list) of all Dataframes now processed.
    """

    edu_dept = ['शिक्षा विभाग (प्राथमिक शिक्षा)', 'शिक्षा विभाग (माध्यमिक शिक्षा)', 'शिक्षा विभाग (उच्च शिक्षा)']  #values to filter

    new_arr = []
    for df in arr:
        result_df = df[df['Grant Head Description'].isin(edu_dept)]         #filtered final data
        output = result_df.sort_values(by=['Division Description'])         #sorted final data
        new_arr.append(output)
    
    return new_arr

# Filter and sort the data we have and save it in csv:

school_edu_Exp = school_edu_Exp(new_arr)

4

In [30]:

year = 2017
for df in school_edu_Exp:
    df.to_csv(f'./output/task-1 ({year}-{year + 1}).csv')
    expenditure = (df[df.columns[21]].sum() *100) / (df[df.columns[15]].sum())
    print(f'State saved {round(expenditure, 2)} % of fund in {year}-{year+1}')
    year = year + 1

State saved -1.12 % of fund in 2017-2018
State saved -1.53 % of fund in 2018-2019
State saved -0.32 % of fund in 2019-2020
State saved 5.33 % of fund in 2020-2021


## Estimate the share of capital expenditure.

In [93]:

def CapEx_filter(arr):
    """
    This function takes in the list of dataframes and filters and sorts
    values based on if the scheme code begins with a 4 or above.

    It'll return a new arrey (list) of all Dataframes now processed.
    """
    new_arr = []
    for df in arr:
        newdf = df
        CapEx = newdf.set_index('Scheme Code')    #get data ready to be filtered according to scheme code
        CapEx = CapEx.filter(
            regex='^[4-9]+[0-9]+', axis=0
            ).sort_values(
                by=['Division Description']
                )                      #check  If the first digit of the scheme code begins with a 4 or above
        new_arr.append(CapEx)
    return new_arr

#final filtered and sorted data of capital expenditures for all years
total_capEx = CapEx_filter(school_edu_Exp)
# Save output csv
year = 2017
for df in total_capEx:
    df.to_csv(f'./output/task-2 ({year}-{year + 1}).csv')   
    year = year + 1

In [120]:
year = 2017
for df in total_capEx:
    capex = (df[df.columns[20]].sum() * 100) / df[df.columns[15]].sum()
    print(f'State capital expenditure is {round(capex, 2)} % in ({year}-{year + 1})')   
    year = year + 1

State capital expenditure is 24.32 % in (2017-2018)
State capital expenditure is 2.12 % in (2018-2019)
State capital expenditure is 0.0 % in (2019-2020)
State capital expenditure is -16.34 % in (2020-2021)


### Using projected population for each of the years under consideration, estimate the per-capita expenditure on school education in the state, and each district.

population data source : https://www.census2011.co.in/census/state/uttar+pradesh.html

In [122]:
population = {
    '2017-18' : 224571834,
    '2018-19' : 228054788,
    '2019-20' : 231521022,
    '2020-21' : 234969561,
}
    
# per capita education expenditure
for key,df in zip(population, school_edu_Exp):
    per_cap = df[df.columns[20]].sum() / population[key]
    print(f'per capita expenditure in {key} : {round(per_cap, 2)}')

per capita expenditure in 2017-18 : 5104.66
per capita expenditure in 2018-19 : 4638.78
per capita expenditure in 2019-20 : 4049.9
per capita expenditure in 2020-21 : 3895.74


### Rank the districts based on utilization of allotted funds of revenue expenditure and capital expenditure (separately).

In [127]:
def ranked_districts(arr):
    """
    This function takes arrey of Dataframes and sorts each data frame according 
    to custom column added initially "excess". The logic is to sort districts based
    on which districts still have excess fund left after expenditure.
    """
    new_arr = []
    for df in arr:
        df = df.sort_values(by='excess', ascending=False)
        new_arr.append(df)
    return new_arr

School_Exp = ranked_districts(school_edu_Exp)
Capital_Exp = ranked_districts(total_capEx)

In [129]:
# Save files of ranking for school expenditure
year = 2017
for df in School_Exp:
    df.to_csv(f'./output/task-4.a({year}-{year + 1}).csv')   
    year = year + 1
# Save files of ranking for school expenditure
year = 2017
for df in Capital_Exp:
    df.to_csv(f'./output/task-4.b({year}-{year + 1}).csv')   
    year = year + 1