# Growth and Challenges in Dutch Higher Education
## Enrollment Trends and Future Projections

- Jens Groen (15853411)
- Thijs van der Meer (15831086)
- Sarah Kruse (15396142)



## Introduction

In this data story project, we investigate the growth and challenges in Dutch higher education using two DUO datasets:
1. **Inschrijvingen WO (2019–2023)**  (University Enrollments (2019–2023)) 
2. **Aantallen en prognoses hogere opleidingen (2017–2038)**  (Numbers and Projections of Higher Education (2017–2038))

We highlight two perspectives:
- **Perspective 1:** The Benefits of Enrollment Growth
- **Perspective 2:** Projections and future bottlenecks due to demographic trends.

Each section is supported with visualizations created in Python.



## Dataset and Preprocessing

We use two datasets, both sourced from DUO Open Education Data:
- `inschrijvingen_wo.csv` (University Enrollments 2019–2023): contains the number of enrollments per academic year, broken down by gender.
- `student_prognoses.csv` (Higher Education Projections 2017–2038): contains historical and predicted student numbers for both universities (WO) and higher professional education (HBO).

**Preprocessing:**
1. Load the CSV-files.
2. Check for missing values.
3. Verify that totals match the sums of the components (gender).
4. Convert columns to the appropriate data types.


In [30]:

import pandas as pd

# Load data
registrations_wo = pd.read_csv('04.-inschrijvingen-wo-2024.csv', sep=';')
student_predictions = pd.read_excel('studentenprognoses-2025.xlsx', header=8)

# Toon eerste rijen
registrations_wo.head(), student_prognoses.head()


(   PROVINCIE  GEMEENTENUMMER GEMEENTENAAM SOORT INSTELLING  \
 0  Friesland              80   Leeuwarden  reguliere inst.   
 1  Friesland              80   Leeuwarden  reguliere inst.   
 2  Friesland              80   Leeuwarden  reguliere inst.   
 3  Friesland              80   Leeuwarden  reguliere inst.   
 4  Friesland              80   Leeuwarden  reguliere inst.   
 
   TYPE HOGER ONDERWIJS INSTELLINGSCODE ACTUEEL      INSTELLINGSNAAM ACTUEEL  \
 0             bachelor                    21PC  Rijksuniversiteit Groningen   
 1             bachelor                    21PC  Rijksuniversiteit Groningen   
 2             bachelor                    21PC  Rijksuniversiteit Groningen   
 3             bachelor                    21PC  Rijksuniversiteit Groningen   
 4               master                    21PC  Rijksuniversiteit Groningen   
 
             ONDERDEEL        SUBONDERDEEL  OPLEIDINGSCODE ACTUEEL  \
 0               recht      n.v.t. (recht)                   50700  

In [31]:
# Preprocessing WO registrations
print('Check for NA in WO:', registrations_wo.isna().sum())
# Note: There is no 'Men' or 'Women' column in inscriptions_wo, so this check has been removed.

# Preprocessing predictions
print('Check for NA in predictions:', student_predictions.isna().sum())
# Check that WO+HBO == Total
student_predictions['Total_check'] = student_predictions['WO'] + student_predictions['HBO']
# Add Total column from Total_check
student_predictions['Total'] = student_predictions['Total_check']
student_predictions.drop(columns=['Total_check'], inplace=True)

# Data types
inscriptions_wo.dtypes, student_predictions.dtypes


Check for NA in WO: PROVINCIE                  0
GEMEENTENUMMER             0
GEMEENTENAAM               0
SOORT INSTELLING           0
TYPE HOGER ONDERWIJS       0
INSTELLINGSCODE ACTUEEL    0
INSTELLINGSNAAM ACTUEEL    0
ONDERDEEL                  0
SUBONDERDEEL               0
OPLEIDINGSCODE ACTUEEL     0
OPLEIDINGSNAAM ACTUEEL     0
OPLEIDINGSVORM             0
GESLACHT                   0
2020                       0
2021                       0
2022                       0
2023                       0
2024                       0
dtype: int64
Check for NA in predictions: Unnamed: 0          36
Gebruikte Primos     4
2024                22
dtype: int64


KeyError: 'WO'


## Perspective 1: The Benefits of Enrollment Growth.
This perspective focuses on the positive impact of the growth in student enrollment in Dutch higher education. You can highlight benefits such as improved access to higher education, increased diversity, and a more educated workforce.

### Key Argument:
The growth in the number of students enrolling in Dutch higher education institutions indicates improved accessibility and reflects rising societal demand for higher qualifications. This growth not only expands opportunities for education but also leads to a more educated labor force, increased cultural diversity, and broader global knowledge exchange.

### Sub-arguments:
#### Improved Access to Higher Education:
The substantial increase in student enrollments is likely a result of policies making higher education more accessible to a larger number of students. Programs like scholarships, financial aid, and outreach initiatives have played a significant role in attracting students from various backgrounds.

In [70]:
# Visualisation 
#A line plot showing the growth in enrollment numbers over the years (from the dataset Inschrijvingen WO 2019-2023), 
#possibly segmented by province or university. This will illustrate the increase in enrollments year over year.

#### Increased Cultural Diversity:
The rise in international students (particularly those from non-EU countries) contributes to a more diverse academic environment. This diversity enriches the educational experience by fostering cultural exchange and offering global perspectives within local contexts.


In [73]:
# Visualisation
# A stacked bar chart showing the breakdown of students by nationality (Dutch, EU, non-EU) 
# over time (from the dataset Herkomst). This visualization would reveal the increasing share of international students.


#### Growth of a Highly Educated Workforce:
With more students graduating from universities and applied sciences institutions, the Dutch labor market is increasingly populated by individuals with advanced qualifications. This trend is essential for addressing skills gaps in key sectors such as healthcare, technology, and education.


In [76]:
# Visualization:
# A bar chart showing the number of graduates per study program over the years. 
# For instance, you could compare STEM fields and social sciences to demonstrate where the largest growth in graduates is occurring.


#### Relevant Data Variables:
- Inschrijvingen WO 2019-2023 dataset for tracking total student enrollment across years.
- Herkomst dataset to explore the national origin of students, especially international enrollments.

In [81]:
# Dit is niet van mij (Sarah)
import matplotlib.pyplot as plt

plt.figure(figsize=(8,5))
plt.plot(inschrijvingen_wo['Jaar'], inschrijvingen_wo['Totaal'], marker='o', color='teal')
plt.title("Totaal aantal inschrijvingen WO per studiejaar (2019–2023)")
plt.xlabel("Studiejaar")
plt.ylabel("Aantal inschrijvingen")
plt.grid(True)
plt.show()


KeyError: 'Jaar'

<Figure size 800x500 with 0 Axes>

### Verdeling naar geslacht
Onderstaande figuren tonen zowel de totaalverdeling als per jaar.

In [None]:

plt.figure(figsize=(6,4))
plt.bar(['Mannen', 'Vrouwen'],
        [inschrijvingen_wo['Mannen'].sum(), inschrijvingen_wo['Vrouwen'].sum()],
        color=['steelblue', 'salmon'])
plt.title("Totaal WO-inschrijvingen 2019–2023 naar geslacht")
plt.ylabel("Aantal inschrijvingen")
plt.show()


In [None]:

plt.figure(figsize=(8,5))
plt.bar(inschrijvingen_wo['Jaar'], inschrijvingen_wo['Mannen'], label='Mannen', color='steelblue')
plt.bar(inschrijvingen_wo['Jaar'], inschrijvingen_wo['Vrouwen'], bottom=inschrijvingen_wo['Mannen'],
        label='Vrouwen', color='salmon')
plt.title("WO-inschrijvingen 2019–2023 naar geslacht per jaar")
plt.xlabel("Studiejaar")
plt.ylabel("Aantal inschrijvingen")
plt.legend()
plt.show()


## Perspective 2: The Challenges of Enrollment Growth

### Key Argument:
Although the growth in student enrollments signifies broader educational opportunities, it also places significant pressure on the education system. Increased enrollment is straining infrastructure, elevating student-teacher ratios, and exacerbating housing shortages, all of which threaten the quality of education and student well-being.

### Sub-arguments:

#### Overburdened Infrastructure and Resources:
As the number of students grows, the physical and human resources available to universities are being stretched thin. Universities face challenges in maintaining adequate classroom space, teaching staff, and facilities. This results in larger classes and fewer resources per student.


In [99]:
# **Visualization:**
# A line plot showing the relationship between student enrollments and the number of teaching staff or classrooms (if available). 
# You could use data from the **Aantallen en prognoses ho** dataset to track the projected enrollments and compare them to current infrastructure data.

#### Decline in Educational Quality:
The rapid increase in student numbers may compromise educational quality. The student-to-teacher ratio has likely increased, 
reducing the level of individualized attention students receive, which could impact learning outcomes and overall satisfaction.

In [102]:
# Visualization:
# A stacked bar chart showing the trend in the student-to-teacher ratio, ideally for specific study programs or universities. 
# This would highlight how increasing enrollments are correlating with a higher number of students per instructor.

#### Student Housing Shortage:
The growing number of students, particularly in larger cities like Amsterdam and Rotterdam, has intensified the demand for student housing. The housing market has been unable to keep pace with this demand, leading to rising rents and a lack of available accommodation for students.

In [105]:
# **Visualization:**
# An area chart or line plot comparing the number of enrollments to the number of available student accommodations (if such data is available). 
# This will show how student housing availability is not keeping up with enrollment growth.

### Relevant Data Variables:
- **Instroom_RR** (Expected Enrollment) from the **Aantallen en prognoses ho** dataset can be used to project future enrollment and compare it with the availability of resources.
- **Ingeschrevenen_HO** shows the actual number of students enrolled per program and can be used to identify the specific programs that are experiencing the highest enrollment growth.

In [None]:

plt.figure(figsize=(8,5))
plt.plot(student_prognoses['Jaar'], student_prognoses['WO'], label='WO', color='teal')
plt.plot(student_prognoses['Jaar'], student_prognoses['HBO'], label='HBO', color='orange')
plt.title("Studentenaantallen HO met prognoses (2017–2038)")
plt.xlabel("Jaar")
plt.ylabel("Aantal studenten")
plt.legend()
plt.grid(True)
plt.show()


In [None]:

import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Scatter(x=student_prognoses['Jaar'], y=student_prognoses['WO'], name="WO", line_color='teal'))
fig.add_trace(go.Scatter(x=student_prognoses['Jaar'], y=student_prognoses['HBO'], name="HBO", line_color='orange'))
fig.update_layout(
    title="Interactieve prognose studenten HO (2017–2038)",
    xaxis_title="Jaar", yaxis_title="Aantal studenten",
    xaxis=dict(rangeslider=dict(visible=True)),
    legend=dict(yanchor="bottom", y=0.01)
)
fig.show()


## Reflection
Feedback from seminar led to clear captions, static/interactive balance, and readable color choices.

## Task distribution
- **Jens Groen:** Data cleaning and figures perspective 1
- **Thijs van der Meer:** Forecasts and interactive visuals
- **Sarah Kruse:** Justifying perspectives, storyline, and structure
