<a href="https://www.kaggle.com/code/ahmad24kky/university-vocational-school-admission-in-russia?scriptVersionId=248224554" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<a href="https://www.kaggle.com/code/ahmad24kky/university-vocational-school-admission-in-russia?scriptVersionId=226937121" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# University and Vocational school Admission in Russia project Analyst

## Business Questions

* Group of professions with smallest acceptance rate every year based on Education level
* Most favorite Group of Professions in every Education level based on Number of Applications
* Percentage Number of Students by Branches of Science
* Most consistent Branches of Science based on Percentage Increase



## Import Library


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Preprocessing Data

### Import dataset

In [None]:
df=pd.read_csv('/kaggle/input/university-admission-in-russia-2014-2023/University and vocational schools admission in Russia 2014-2023.csv',sep=';')

df.head(5)

In [None]:
df.info()


In [None]:
df.isna().sum()

In [None]:
df = df.dropna().loc[~(df == 0).any(
    axis=1)].reset_index(drop=True)

## Exploratory Data Analysis (EDA)

### Group of professions with smallest acceptance rate every year based on Education level


In [None]:
acceptance_rate = df['Number of Students'] / df['Number of Applications'] * 100
df['Acceptance Rate'] = acceptance_rate

higher_education = df.loc[df['Education level'] == "Higher Education"]

higher_education = higher_education.loc[higher_education.groupby(
    'Year')['Acceptance Rate'].idxmin()]

higher_education = {
    "Group of Professions": higher_education['Group of Professions'],
    "Year": higher_education['Year'],
    "Acceptance Rate (In Percent)": higher_education['Acceptance Rate']}
higher_education = pd.DataFrame(higher_education)
pd.DataFrame(higher_education).reset_index(drop=True)

In [None]:
acceptance_rate = df['Number of Students'] / df['Number of Applications'] * 100
df['Acceptance Rate'] = acceptance_rate

vocational_education = df.loc[df['Education level'] == "Vocational Education"]


vocational_education = vocational_education.loc[vocational_education.groupby(
    'Year')['Acceptance Rate'].idxmin()]

vocational_education = {
    "Group of Professions": vocational_education['Group of Professions'],
    "Year": vocational_education['Year'],
    "Acceptance Rate (In Percent)": vocational_education['Acceptance Rate']}
vocational_education = pd.DataFrame(vocational_education)

pd.DataFrame(vocational_education).reset_index(drop=True)

### Most favorite Branches of Science and Group of Professions in every Education level based on Number of Applications


In [None]:

question_2 = df.groupby(by=["Education level", "Group of Professions"]).agg({
    "Number of Applications": "sum"}).reset_index()

pd.DataFrame(question_2).nlargest(
    10, "Number of Applications").reset_index(drop=True)

### Percentage Number of Students by Branches of Science


In [None]:
question_3 = df.groupby(by=["Branches of Science"]).agg({
    "Number of Students": "sum"}).reset_index()

question_3["Percentage"] = question_3['Number of Students'] * 100 / \
    question_3['Number of Students'].sum()

question_3['Percentage'] = question_3['Percentage'].round(2)

pd.DataFrame(question_3).sort_values(
    by="Number of Students", ascending=False).reset_index(drop=True)

### Most consistent Branches of Science based on Percentage Increase


In [None]:
question_4 = df.groupby(by=["Year", "Branches of Science",]).agg({
    "Number of Applications": "sum",
}).reset_index()
question_4["Percentage Increase"] = question_4.groupby(
    'Branches of Science')['Number of Applications'].pct_change().fillna(0) * 100

question_4["Percentage Increase"] = question_4["Percentage Increase"].round(2)

pd.DataFrame(question_4)

## Data Visualization

### Group of professions with smallest acceptance rate every year based on Education level


In [None]:
fig, ax = plt.subplots(figsize=(20, 10), nrows=1, ncols=2)
sns.barplot(data=higher_education, y="Acceptance Rate (In Percent)", x="Year",
            hue="Group of Professions", errorbar=None, ax=ax[0])
ax[0].set_xlabel("Group of Professions")
ax[0].set_ylabel("Acceptance Rate (In Percent)")
ax[0].yaxis.set_label_position("right")
ax[0].yaxis.tick_right()
ax[0].set_title("Acceptance Rate of Higher education",
                loc="center", fontsize=20)
ax[0].tick_params(axis='x', labelsize=15)
ax[0].legend(loc='upper right', bbox_to_anchor=(0, 1))

sns.barplot(data=vocational_education, y="Acceptance Rate (In Percent)", x="Year",
            hue="Group of Professions", errorbar=None, ax=ax[1])
ax[1].set_xlabel("Group of Professions")
ax[1].set_ylabel("Acceptance Rate (In Percent)")
ax[1].yaxis.set_label_position("right")
ax[1].yaxis.tick_right()
ax[1].set_title("Acceptance Rate of Vocational education",
                loc="center", fontsize=20)
ax[1].tick_params(axis='x', labelsize=15)
ax[1].legend(loc='upper left', bbox_to_anchor=(1.05, 1))

plt.suptitle(
    "Group of professions with smallest acceptance rate every year based on Education level", fontsize=25)
plt.show(fig)

### Most favorite Group of Professions in every Education level based on Number of Applications


In [None]:
fig, ax = plt.subplots(figsize=(24, 6))
sns.barplot(data=question_2.sort_values(by="Number of Applications", ascending=False).head(10),
            x="Group of Professions", y="Number of Applications", errorbar=None, hue="Education level")
ax.set_ylabel("Number of Applications (In Million)", fontsize=12)
ax.set_xlabel("Group of Professions", fontsize=12)
ax.set_title("Most favorite Group of Professions based on average of Number of Applications in every Education level",
             loc="center", fontsize=15)
ax.tick_params(axis='x', rotation=60, labelsize=12)

plt.show(fig)

### Percentage Number of Students by Branches of Science


In [None]:
plt.pie(question_3['Number of Students'],
        labels=None,
        )
labels_legend = [f"{j} ({p}%)" for j, p in zip(
    question_3['Branches of Science'], question_3['Percentage'])]
plt.legend(title="Branches of Science",
           labels=labels_legend,
           loc="best",
           bbox_to_anchor=(0, 0, 0, 1))
plt.title("Percentage Number of Students by Branches of Science")
plt.show()

### Most consistent Branches of Science based on Percentage Increase


In [None]:

sns.pointplot(question_4, x=question_4['Year'],
              y=question_4['Number of Applications'], hue=question_4['Branches of Science'])
plt.grid(True)
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

## Conclusion

### Group of professions with smallest acceptance rate every year based on Education level
#### Higher Education

* Economics and Management in 2014 and 2015
* Photonics, Instrument Engineering, Optical and Biotechnical Technologies in 2016
* Physics and Technical Sciences and Technologies in 2017,2019 and 2020
* Nanotechnology in 2018 and 2021
* Chemical Technologies	in 2022 and 2023

#### Vocational education


* Pharmacy in 2014,2015,2017,2019,2020,2021 and 2022
* Art History in 2016
* Health Sciences and Preventive Healthcare in 2018
* Mass Media and Information in 2023

### Most favorite Group of Professions in every Education level based on Number of Applications
* Higher Education :
    - Economics and Management with 10.318.652 Applications
    - Education Sciences and Pedagogy with 5.889.276 Applications
    - Information Technology with 4.613.268 Applications
    - Jurisprudence	 with 4.163.021 Applications
    - Clinical Medicine with 3.804.193 Applications
    - Linguistics and Philology	 with 1.911.232 Applications
    - Mechanical Engineering with 1.698.315 Applications
* Vocational Education :
    - Information Technology with 2.026.742 Applications
    - Economics and Management with 1.858.111 Applications
    - Transport Equipment and Technologies with 1.793.869 Applications

### Percentage Number of Students by Branches of Science
* Engineering and Technology 39.52%
* Social Sciences 29.71%
* Education Sciences and Pedagogy 8.21%
* Healthcare and Medicine 7.81%
* Agriculture 4.96%
* Humanities 3.52%
* Arts and Culture 3.51%
* Mathematical and Natural Sciences 2.7%
* Oriental and African Studies 0.06%
* Military Science 0.00%

    Note : Military Science have Students but amount of all the students not reach 0.01%

### Most consistent Branches of Science based on Percentage Increase

* Arts and Culture,Education Sciences and Pedagogy,Engineering and Technology, and Humanities are the Branches of Science with most consistent increase