
# Exercise 4 - Data Science Project Preperations

---

###  Overview

In this assignment, you will choose a dataset, explore it using Python and pandas, and then develop your own research question(s) based on what you discover. The focus is not only on running code, but on *thinking like a data scientist*: observing patterns, noticing anomalies, and letting the data guide you toward meaningful questions.

You will submit:

1. **A Jupyter Notebook**
   containing your code, analysis steps, comments, and results.

2. **A short presentation**
   summarizing your exploration, insights, and proposed research question(s).

---

### Learning Goals

By completing this assignment you will learn to:

* Obtain and load real-world data
* Perform initial data inspection and cleaning
* Use descriptive & inferential statistics to understand the dataset
* Visualize patterns using basic plots
* Formulate data-driven questions for deeper analysis
* Communicate findings clearly and concisely

---

### Step 1 — Choose and Download Data

Pick a dataset that interests you, from the [International Social Survey Programme (ISSP) cite](https://issp.org/), and download it.

You can search [by year](https://issp.org/data-download/by-year/) or [by topic](https://issp.org/data-download/by-topic/).

Remember to also download the survey questionnaire.  

> Choose something that genuinely sparks curiosity — this will make the question-forming stage meaningful.

---

### Step 2 — Explore the Data

Use pandas and visualization tools to learn about the dataset.

Example exploration steps:

* How many rows & columns? Types?
* Missing values? Outliers?
* Summary statistics & distributions
* Basic inferential statistics
* Correlations 
* Visualizations (at least 3)

If you wish, you can also begin using scikit-learn and statsmodels for more in-depth analysis (e.g., linear and/or logistic regression).

---

### **Step 3 — Define Your Research Question(s)**

Define **one main research question or research direction** that will guide your analysis.

Examples of possible research directions:

* *How do views of family differ across countries?*
* *How do attitudes toward raising children differ between genders?*
* *How does financial situation influence views on parental roles?*

---

Using your exploratory analysis, examine the data **in light of your chosen research direction**. Reflect on what the data reveals and consider the following:

* What patterns or trends stand out?
* Which relationships appear interesting or potentially meaningful?
* What new hypotheses emerged during exploration?
* If you continued this project, what would you investigate next?

---

Based on this exploration, define **specific, data-driven questions** that move you closer to understanding your research goal.

Examples of research question styles:

* *Does X relate to Y?*
* *Can we predict Z based on features A, B, and C?*
* *How does category X differ from category Y?*
* *Which variables seem to influence outcome W?*

---

You do **not** need to fully answer these questions at this stage.
The goal is to **identify promising directions for deeper analysis**, not to produce final results.

---

### What to Submit

| Deliverable                   | Requirements                                                        |
| ----------------------------- | ------------------------------------------------------------------- |
| **Jupyter Notebook (.ipynb)** | code, analysis, plots, commentary in markdown                       |
| **Presentation (PDF/PPT)**    | describe dataset, process, insights, conclusions, research question(s) |
| **Optional extras**           | cleaned dataset, visualizations, deeper analysis, modeling attempts |

Your notebook should tell the story of your exploration.
Your presentation should communicate the highlights.

---

### Suggested Notebook Structure

You may follow this template:

1. **Title + team/author**
2. **Dataset source & description**
3. **Data loading**
4. **Initial exploration**
5. **Visualizations**
6. **Insights + interpretations**
7. **Further questions & proposed research direction**

---

### Evaluation Emphasis

* Curiosity and initiative
* Quality of thoroughness of exploration
* Clarity of explanations
* Thoughtfulness of research question
* Clarity and design of the code

It is not about getting a “right” answer — discovery is the goal.

In the final project we also consider

* Using a wide set of tools: exploratory analysis, regression, classification, and unsupervised learning.

---