# Growth and Challenges in Dutch Higher Education
## Enrollment Trends and Future Projections

- Jens Groen (15853411)
- Thijs van der Meer (15831086)
- Sarah Kruse (15396142)



## Introduction

In this data story project, we investigate the growth and challenges in Dutch higher education using two DUO datasets:
1. **Inschrijvingen WO (2019–2023)**  (University Enrollments (2019–2023)) 
2. **Aantallen en prognoses hogere opleidingen (2017–2038)**  (Numbers and Projections of Higher Education (2017–2038))

We highlight two perspectives:
- **Perspective 1:** The Benefits of Enrollment Growth
- **Perspective 2:** Projections and future bottlenecks due to demographic trends.

Each section is supported with visualizations created in Python.



## Dataset and Preprocessing

We use two datasets, both sourced from DUO Open Education Data:
- `inschrijvingen_wo.csv` (University Enrollments 2019–2023): contains the number of enrollments per academic year, broken down by gender.
- `student_prognoses.csv` (Higher Education Projections 2017–2038): contains historical and predicted student numbers for both universities (WO) and higher professional education (HBO).

## Data Preprocessing Summary

The raw datasets consist of projected student numbers (prognoses) and actual enrolment figures (inschrijvingen) across various Dutch higher education institutions. These two data sources were cleaned, transformed, and merged to form a single structured dataset suitable for analysis and visualization. The preprocessing involved the following steps:

### 1. Loading the Data
We loaded the student prognosis data from an Excel file (`studentenprognoses-2025.xlsx`, sheet `i v o s t h`) and the enrolment data from a semicolon-delimited CSV file (`04.-inschrijvingen-wo-2024.csv`). Both datasets contained columns with inconsistent formatting and redundant information.

### 2. Cleaning Column Names
To standardize the data and facilitate further processing, all column names were:
- converted to lowercase,
- stripped of leading/trailing spaces,
- and had spaces replaced with underscores (`_`).

This harmonization ensures consistent reference across operations.

### 3. Filtering Relevant Columns
From the prognosis dataset, only the relevant columns were retained:
- `instellingscode`, `instellingsnaam`, `jaar` (institution code, name, and year),
- `instroom_ho`, `tweedejaars_ho`, and `hogerejaars_ho` (prognoses for first-year, second-year, and higher-year students).

These variables reflect forecasted student flows per institution and year.

### 4. Reshaping Enrolment Data
The enrolment data initially had one column per year. Using a transformation to "long format" (via a melt operation), we converted these year-columns into rows, yielding a structure with:
- institution metadata,
- `jaar` (year),
- and `inschrijvingen` (number of enrolments).

### 5. Cleaning Enrolment Values
Some enrolment values were non-numeric or contained special symbols (e.g., `<5`). We extracted the numeric portion using a regular expression and replaced missing or invalid values with `0`.

### 6. Harmonizing Institution Codes
If present, the `instellingscode_actueel` column (current institution code) was renamed to `instellingscode`, to match the naming in the prognosis dataset. Likewise, `instellingsnaam_actueel` was renamed to `instellingsnaam`.

### 7. Merging Prognoses and Enrolments
We merged the datasets on `instellingscode` and `jaar`, using a **left join** to preserve all forecasted years—even those for which enrolment data is not yet available (i.e., future years).

### 8. Handling Missing Data
Missing enrolment values (especially for future years) were filled with `0`, ensuring numerical consistency across rows.

### 9. Renaming Prognosis Columns
To make the columns more descriptive, the following renamings were applied:
- `instroom_ho` → `prognose_instroom`
- `tweedejaars_ho` → `prognose_tweedejaars`
- `hogerejaars_ho` → `prognose_hogerejaars`

### 10. Saving the Final Dataset
The final merged and cleaned dataset was sorted by institution and year, and saved as a CSV file:



In [1]:
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
!pip install openpyxl

# Load data
registraties_wo = pd.read_csv('04.-inschrijvingen-wo-2024.csv', sep=';')
pre_student_prognoses = pd.read_excel('studentenprognoses-2025.xlsx', sheet_name="i v o s t h") #Selecteer de juiste sheet uit de dataset





In [2]:
#preprocessing dataset 1
jaren = ['2020','2021','2022','2023', '2024'] #lijstje met de jaren om opvolgende stappen makkelijker te maken

registraties_wo = registraties_wo.replace('<5', value='0') #veranderd waardes van <5 naar 0

for jaar in jaren:
    registraties_wo[jaar] = pd.to_numeric(registraties_wo[jaar], errors='coerce')
    
#preprocessing dataset 2
jaren2 = ['2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']

pre_student_prognoses = pre_student_prognoses.replace('<5', value='0')

for column in ['ingeschrevenen_HO', 'instroom_RR', 'instroom_HO', 'tweedejaars_HO', 'hogerejaars_HO', 'instroom_THO_NW', 'tweedejaars_THO_NW', 'hogerejaars_THO_NW']:
    pre_student_prognoses[column] = pd.to_numeric(pre_student_prognoses[column], errors='coerce')

student_realisatie = pre_student_prognoses[pre_student_prognoses['status'] == 'realisatie']
student_prognoses = pre_student_prognoses[pre_student_prognoses['status'] == 'prognose']



## Perspective 1: The Benefits of Enrollment Growth.
This perspective focuses on the positive impact of the growth in student enrollment in Dutch higher education. You can highlight benefits such as improved access to higher education, increased diversity, and a more educated workforce.

### Key Argument:
The growth in the number of students enrolling in Dutch higher education institutions indicates improved accessibility and reflects rising societal demand for higher qualifications. This growth not only expands opportunities for education but also leads to a more educated labor force, increased cultural diversity, and broader global knowledge exchange.

### Sub-arguments:
#### Improved Access to Higher Education:
The substantial increase in student enrollments is likely a result of policies making higher education more accessible to a larger number of students. Programs like scholarships, financial aid, and outreach initiatives have played a significant role in attracting students from various backgrounds.

In [3]:


plt1 = go.Figure([go.Bar(
    x=jaren,
    y=[(registraties_wo[i].sum()) for i in jaren],
    
)])

plt1.update_layout(title="Number of students starting Higher Education in The Netherlands (2020 - 2024)", height = 600)
plt1.show()

#### Increased Cultural Diversity:
The rise in international students (particularly those from non-EU countries) contributes to a more diverse academic environment. This diversity enriches the educational experience by fostering cultural exchange and offering global perspectives within local contexts.


In [4]:
jaren2 = [2017, 2018, 2019, 2020, 2021,2022, 2023,2024]

totaal_NL = []
totaal_EER = []
totaal_niet_EER = []

for jaar in jaren2:
    totaal = student_realisatie[
        (student_realisatie['jaar'] == jaar) & 
        (student_realisatie['herkomst'] == 'NL')
    ]['ingeschrevenen_HO'].sum()
    
    totaal_NL.append(totaal)

for jaar in jaren2:
    totaal = student_realisatie[
        (student_realisatie['jaar'] == jaar) & 
        (student_realisatie['herkomst'] == 'EER')
    ]['ingeschrevenen_HO'].sum()
    
    totaal_EER.append(totaal)

for jaar in jaren2:
    totaal = student_realisatie[
        (student_realisatie['jaar'] == jaar) & 
        (student_realisatie['herkomst'] == 'niet-EER')
    ]['ingeschrevenen_HO'].sum()
    
    totaal_niet_EER.append(totaal)

fig2 = go.Figure(data=[
    go.Bar(name='Nederlandse Studenten', x=jaren2, y=totaal_NL),
    go.Bar(name='Europese Studenten', x=jaren2, y=totaal_EER),
    go.Bar(name='Niet-Europese Studenten', x=jaren2, y=totaal_niet_EER),
])

fig2.update_layout(barmode='stack', title='Place of Origin of Students in Dutch Higher Education (2017 - 2024)', height = 600)
fig2.show()

#### Growth of a Highly Educated Workforce:
With more students graduating from universities and applied sciences institutions, the Dutch labor market is increasingly populated by individuals with advanced qualifications. This trend is essential for addressing skills gaps in key sectors such as healthcare, technology, and education.


In [5]:
afdelingen = registraties_wo['ONDERDEEL'].unique()

data_fig3 = {afdeling : [registraties_wo[registraties_wo['ONDERDEEL'] == afdeling][jaar].sum() for jaar in jaren] for afdeling in afdelingen }



fig3 = go.Figure(
    data = [go.Bar(name=afdeling, x = jaren, y = data_fig3[afdeling]) for afdeling in afdelingen]
)
fig3.update_layout(barmode='stack', height = 600)
fig3.show()



### Verdeling naar geslacht
Onderstaande figuren tonen zowel de totaalverdeling als per jaar.

In [6]:
totaal_vrouw = [registraties_wo[registraties_wo['GESLACHT'] == 'vrouw'][jaar].sum() for jaar in jaren]
totaal_man = [registraties_wo[registraties_wo['GESLACHT'] == 'man'][jaar].sum() for jaar in jaren]

fig4 = go.Figure(data = [
    go.Bar(name='Mannelijke studenten', x = jaren, y = totaal_vrouw),
    go.Bar(name='Vrouwelijke studenten', x = jaren, y = totaal_man)
])

fig4.update_layout( height = 600, title = 'Numper of Students in Dutch Higher Education per Gender (2020 - 2021)')
fig4.show()


## Perspectief 2: Prognoses en Toekomstige Knelpunten

**Argument:**  
DUO-prognoses laten zien dat studentenaantallen na een piek in 2024 zullen dalen door demografische krimp.

In [7]:
jaren3 = [2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038]

fig5 = go.Figure([go.Bar(
    y = [student_prognoses[student_prognoses['jaar'] == jaar]['ingeschrevenen_HO'].sum() for jaar in jaren3],
    x = jaren3
)])
fig5.update_layout(title = 'Projected Number of Students in Dutch Higher Education (2025 - 2038)', height = 600)

fig5.show()

In [8]:
totaal_NL = []
totaal_EER = []
totaal_niet_EER = []

for jaar in jaren3:
    totaal = student_prognoses[
        (student_prognoses['jaar'] == jaar) & 
        (student_prognoses['herkomst'] == 'NL')
    ]['ingeschrevenen_HO'].sum()
    
    totaal_NL.append(totaal)

for jaar in jaren3:
    totaal = student_prognoses[
        (student_prognoses['jaar'] == jaar) & 
        (student_prognoses['herkomst'] == 'EER')
    ]['ingeschrevenen_HO'].sum()
    
    totaal_EER.append(totaal)

for jaar in jaren3:
    totaal = student_prognoses[
        (student_prognoses['jaar'] == jaar) & 
        (student_prognoses['herkomst'] == 'niet-EER')
    ]['ingeschrevenen_HO'].sum()
    
    totaal_niet_EER.append(totaal)

fig6 = go.Figure(data=[
    go.Bar(name='Nederlandse Studenten', x=jaren3, y=totaal_NL),
    go.Bar(name='Europese Studenten', x=jaren3, y=totaal_EER),
    go.Bar(name='Niet-Europese Studenten', x=jaren3, y=totaal_niet_EER),
])

fig6.update_layout(barmode='stack', title='Prognoses Number of Students in Dutch Higher Education per Origin (2025 - 2038)', height = 600)
fig6.show()